[jira] [Commented] (ARROW-5952) [Python] Segfault when reading empty table with category as pandas dataframe

Joris Van den Bossche (JIRA) Wed, 14 Aug 2019 04:16:39 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907180#comment-16907180
 ]


Joris Van den Bossche commented on ARROW-5952:
----------------------------------------------

I have been looking into it a bit. The main cause is that for chunked 
DictionaryArray, we need to access the first chunk to get the "dictionary 
values" (now that those live on the array object instead of on the type). So 
the code for converting a chunked dictionary array to pandas is not robust 
against 0 chunks.

> [Python] Segfault when reading empty table with category as pandas dataframe
> ----------------------------------------------------------------------------
>
>                 Key: ARROW-5952
>                 URL: https://issues.apache.org/jira/browse/ARROW-5952
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.14.0, 0.14.1
>         Environment: Linux 3.10.0-327.36.3.el7.x86_64
> Python 3.6.8
> Pandas 0.24.2
> Pyarrow 0.14.0
>            Reporter: Daniel Nugent
>            Priority: Major
>             Fix For: 0.15.0
>
>
> I have two short sample programs which demonstrate the issue:
> {code:java}
> import pyarrow as pa
> import pandas as pd
> empty = pd.DataFrame({'foo':[]},dtype='category')
> table = pa.Table.from_pandas(empty)
> outfile = pa.output_stream('bar')
> writer = pa.RecordBatchFileWriter(outfile,table.schema)
> writer.write(table)
> writer.close()
> {code}
> {code:java}
> import pyarrow as pa
> pa.ipc.open_file('bar').read_pandas()
> Segmentation fault
> {code}
> My apologies if this was already reported elsewhere, I searched but could not 
> find an issue which seemed to refer to the same behavior.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (ARROW-5952) [Python] Segfault when reading empty table with category as pandas dataframe

Reply via email to