Joris Van den Bossche created ARROW-7638:
--------------------------------------------
Summary: [Python] Segfault when inspecting dataset.Source with
invalid file/partitioning
Key: ARROW-7638
URL: https://issues.apache.org/jira/browse/ARROW-7638
Project: Apache Arrow
Issue Type: Bug
Reporter: Joris Van den Bossche
Getting a segfault with:
{code}
In [1]: import pyarrow.dataset as ds
In [2]: !touch test_empty.txt
In [3]: source_factory = ds.source("test_empty.txt",
partitioning=ds.partitioning(field_names=['a', 'b']))
In [4]: source_factory.inspect()
Segmentation fault (core dumped)
{code}
Didn't yet further investigate what might be the reason (there are several
"wrong" things here: it's not a valid file for the parquet format, the
partitioning does not match the files, etc)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)