Joris Van den Bossche created ARROW-7638:
--------------------------------------------

             Summary: [Python] Segfault when inspecting dataset.Source with 
invalid file/partitioning
                 Key: ARROW-7638
                 URL: https://issues.apache.org/jira/browse/ARROW-7638
             Project: Apache Arrow
          Issue Type: Bug
            Reporter: Joris Van den Bossche


Getting a segfault with:

{code}
In [1]: import pyarrow.dataset as ds                                            
                                                                                
                                                   

In [2]: !touch test_empty.txt                                                   
                                                                                
                                                   

In [3]: source_factory = ds.source("test_empty.txt", 
partitioning=ds.partitioning(field_names=['a', 'b']))                           
                                                                              

In [4]: source_factory.inspect()                                                
                                                                                
                                                   
Segmentation fault (core dumped)
{code}

Didn't yet further investigate what might be the reason (there are several 
"wrong" things here: it's not a valid file for the parquet format, the 
partitioning does not match the files, etc)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to