[
https://issues.apache.org/jira/browse/ARROW-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525698#comment-16525698
]
Wes McKinney commented on ARROW-2744:
-------------------------------------
This is a tricky problem, since auto-normalizing would also have
microperformance implications that might be undesirable.
In some cases, null is actually an acceptable value. For example: consider a
nullable array with all non-null values. We (currently) consider it acceptable
for the validity bitmap to be null in this case.
This problem has come up enough that I think we should continue to investigate
all of the nuances so we know how best to proceed. With your proposed
solutions, changing pyarrow to not yield lists with null buffers seems like a
reasonable fix. We should also fix parquet-cpp to not segfault on a null
buffer.
> [Python] Writing to parquet crashes when writing a ListArray of empty lists
> ----------------------------------------------------------------------------
>
> Key: ARROW-2744
> URL: https://issues.apache.org/jira/browse/ARROW-2744
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.9.0
> Environment: Python Version: 3.6.3 (Anaconda)
> OS: OSX and Linux
> Reporter: Anton Daitche
> Assignee: Antoine Pitrou
> Priority: Major
> Fix For: 0.10.0
>
>
> When writing a ListArray which contains only empty lists to Parquet, Pyarrow
> crashes. Here is a minimal code snippet which reproduces the crash:
> {code:java}
> import pyarrow as pa
> from pyarrow import parquet as pq
> array = pa.array([[]], type=pa.list_(pa.int32()))
> table = pa.Table.from_arrays([array], ["A"])
> pq.write_table(table, "tmp.parq"){code}
> When the ListArray has at least one non-empty list, the issue disappears.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)