[ 
https://issues.apache.org/jira/browse/ARROW-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525698#comment-16525698
 ] 

Wes McKinney commented on ARROW-2744:
-------------------------------------

This is a tricky problem, since auto-normalizing would also have 
microperformance implications that might be undesirable. 

In some cases, null is actually an acceptable value. For example: consider a 
nullable array with all non-null values. We (currently) consider it acceptable 
for the validity bitmap to be null in this case. 

This problem has come up enough that I think we should continue to investigate 
all of the nuances so we know how best to proceed. With your proposed 
solutions, changing pyarrow to not yield lists with null buffers seems like a 
reasonable fix. We should also fix parquet-cpp to not segfault on a null 
buffer. 

> [Python] Writing to parquet crashes when writing a ListArray of empty lists 
> ----------------------------------------------------------------------------
>
>                 Key: ARROW-2744
>                 URL: https://issues.apache.org/jira/browse/ARROW-2744
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.9.0
>         Environment: Python Version: 3.6.3 (Anaconda)
> OS: OSX and Linux
>            Reporter: Anton Daitche
>            Assignee: Antoine Pitrou
>            Priority: Major
>             Fix For: 0.10.0
>
>
> When writing a ListArray which contains only empty lists to Parquet, Pyarrow 
> crashes. Here is a minimal code snippet which reproduces the crash:
> {code:java}
> import pyarrow as pa
> from pyarrow import parquet as pq
> array = pa.array([[]], type=pa.list_(pa.int32()))
> table = pa.Table.from_arrays([array], ["A"])
> pq.write_table(table, "tmp.parq"){code}
> When the ListArray has at least one non-empty list, the issue disappears.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to