[
https://issues.apache.org/jira/browse/ARROW-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907180#comment-16907180
]
Joris Van den Bossche commented on ARROW-5952:
----------------------------------------------
I have been looking into it a bit. The main cause is that for chunked
DictionaryArray, we need to access the first chunk to get the "dictionary
values" (now that those live on the array object instead of on the type). So
the code for converting a chunked dictionary array to pandas is not robust
against 0 chunks.
> [Python] Segfault when reading empty table with category as pandas dataframe
> ----------------------------------------------------------------------------
>
> Key: ARROW-5952
> URL: https://issues.apache.org/jira/browse/ARROW-5952
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.14.0, 0.14.1
> Environment: Linux 3.10.0-327.36.3.el7.x86_64
> Python 3.6.8
> Pandas 0.24.2
> Pyarrow 0.14.0
> Reporter: Daniel Nugent
> Priority: Major
> Fix For: 0.15.0
>
>
> I have two short sample programs which demonstrate the issue:
> {code:java}
> import pyarrow as pa
> import pandas as pd
> empty = pd.DataFrame({'foo':[]},dtype='category')
> table = pa.Table.from_pandas(empty)
> outfile = pa.output_stream('bar')
> writer = pa.RecordBatchFileWriter(outfile,table.schema)
> writer.write(table)
> writer.close()
> {code}
> {code:java}
> import pyarrow as pa
> pa.ipc.open_file('bar').read_pandas()
> Segmentation fault
> {code}
> My apologies if this was already reported elsewhere, I searched but could not
> find an issue which seemed to refer to the same behavior.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)