[
https://issues.apache.org/jira/browse/ARROW-7907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17061623#comment-17061623
]
Joris Van den Bossche commented on ARROW-7907:
----------------------------------------------
So a small reproducer that creates the table with 0-chunk ChunkedArray as
column directly (instead of getting it from slicing the table) still aborts:
{code}
table = pa.table({'a': pa.chunked_array([], type=pa.timestamp('us'))})
assert table.column('a').num_chunks == 0
table.to_pandas()
{code}
> [Python] Conversion to pandas of empty table with timestamp type aborts
> -----------------------------------------------------------------------
>
> Key: ARROW-7907
> URL: https://issues.apache.org/jira/browse/ARROW-7907
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Reporter: Joris Van den Bossche
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.17.0
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> Creating an empty table:
> {code}
> In [1]: table = pa.table({'a': pa.array([], type=pa.timestamp('us'))})
>
>
> In [2]: table['a']
>
>
> Out[2]:
> <pyarrow.lib.ChunkedArray object at 0x7fbb783e8098>
> [
> []
> ]
> In [3]: table.to_pandas()
>
>
> Out[3]:
> Empty DataFrame
> Columns: [a]
> Index: []
> {code}
> the above works. But the ChunkedArray still has 1 empty chunk. When filtering
> data, you can actually get no chunks, and this fails:
> {code}
> In [4]: table2 = table.slice(0, 0)
>
>
> In [5]: table2['a']
>
>
> Out[5]:
> <pyarrow.lib.ChunkedArray object at 0x7fbb783aa4a8>
> [
> ]
> In [6]: table2.to_pandas()
>
>
> ../src/arrow/table.cc:48: Check failed: (chunks.size()) > (0) cannot
> construct ChunkedArray from empty vector and omitted type
> ...
> Aborted (core dumped)
> {code}
> and this seems to happen specifically for timestamp type, and specifically
> with non-ns unit (eg with us as above, which is the default in arrow).
> I noticed this when reading a parquet file of the taxi dataset, where the
> filter I used resulted in an empty batch.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)