[
https://issues.apache.org/jira/browse/ARROW-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523926#comment-16523926
]
Antoine Pitrou edited comment on ARROW-2744 at 6/26/18 4:14 PM:
----------------------------------------------------------------
Here is the crux of the issue;
{code}
>>> arr = pa.array([[]], type=pa.list_(pa.int32()))
>>> arr.buffers()
[<pyarrow.lib.Buffer at 0x7f1b7f68f0d8>,
<pyarrow.lib.Buffer at 0x7f1b7f68f998>,
None,
None]
>>> arr = pa.array([[1]], type=pa.list_(pa.int32()))
>>> arr.buffers()
[<pyarrow.lib.Buffer at 0x7f1b93163378>,
<pyarrow.lib.Buffer at 0x7f1b7ee387d8>,
<pyarrow.lib.Buffer at 0x7f1b7ee38998>,
<pyarrow.lib.Buffer at 0x7f1b7ee385a8>]
{code}
We have two solutions:
# Fix parquet-cpp so that it accepts null data buffers
# Fix pyarrow so that it never generates list arrays with empty data buffers
In general this issue with buffers possibly being null (or None in Python)
seems to regularly crop up in various places.
[~wesmckinn], what do you think?
was (Author: pitrou):
Here is the crux of the issue;
{code}
>>> arr.buffers()
[<pyarrow.lib.Buffer at 0x7f1b7f68f0d8>,
<pyarrow.lib.Buffer at 0x7f1b7f68f998>,
None,
None]
>>> arr = pa.array([[1]], type=pa.list_(pa.int32()))
>>> arr.buffers()
[<pyarrow.lib.Buffer at 0x7f1b93163378>,
<pyarrow.lib.Buffer at 0x7f1b7ee387d8>,
<pyarrow.lib.Buffer at 0x7f1b7ee38998>,
<pyarrow.lib.Buffer at 0x7f1b7ee385a8>]
{code}
We have two solutions:
# Fix parquet-cpp so that it accepts null data buffers
# Fix pyarrow so that it never generates list arrays with empty data buffers
In general this issue with buffers possibly being null (or None in Python)
seems to regularly crop up in various places.
[~wesmckinn], what do you think?
> [Python] Writing to parquet crashes when writing a ListArray of empty lists
> ----------------------------------------------------------------------------
>
> Key: ARROW-2744
> URL: https://issues.apache.org/jira/browse/ARROW-2744
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.9.0
> Environment: Python Version: 3.6.3 (Anaconda)
> OS: OSX and Linux
> Reporter: Anton Daitche
> Assignee: Antoine Pitrou
> Priority: Major
> Fix For: 0.10.0
>
>
> When writing a ListArray which contains only empty lists to Parquet, Pyarrow
> crashes. Here is a minimal code snippet which reproduces the crash:
> {code:java}
> import pyarrow as pa
> from pyarrow import parquet as pq
> array = pa.array([[]], type=pa.list_(pa.int32()))
> table = pa.Table.from_arrays([array], ["A"])
> pq.write_table(table, "tmp.parq"){code}
> When the ListArray has at least one non-empty list, the issue disappears.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)