[
https://issues.apache.org/jira/browse/ARROW-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17061622#comment-17061622
]
Joris Van den Bossche commented on ARROW-8142:
----------------------------------------------
[~fjetter] thanks for the report!
(we have several issues with such 0-chunks chunked arrays .., eg also
ARROW-7907)
A smaller reproducer without parquet (the parquet reading generated a
ChunkedArray with 0 chunks instead of 1 chunk of length 0, but it's the actual
0-chunk array that causes the problem):
{code}
typ = pa.dictionary(pa.int8(), pa.string())
typ2 = pa.string()
arr = pa.chunked_array([], type=typ)
arr.cast(typ2)
{code}
> [Python/C++] Casting empty table from after parquet roundtrip causes critical
> failure
> -------------------------------------------------------------------------------------
>
> Key: ARROW-8142
> URL: https://issues.apache.org/jira/browse/ARROW-8142
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++
> Reporter: Florian Jetter
> Priority: Major
> Fix For: 0.17.0
>
>
> When casting a schema of an empty table from dict encoded to non-dict encoded
> type a critical error is raised and not handled causing the interpreter to
> shut down.
> This only happens after a parquet roundtrip
>
> {code:python}
> import pyarrow as pa
> import pandas as pd
> import pyarrow.parquet as pq
> df = pd.DataFrame({"col": ["a"]}).astype({"col": "category"}).iloc[:0]
> table = pa.Table.from_pandas(df)
> field = table.schema[0]
> new_field = pa.field(field.name, field.type.value_type, field.nullable,
> field.metadata)
> buf = pa.BufferOutputStream()
> pq.write_table(table, buf)
> reader = pa.BufferReader(buf.getvalue().to_pybytes())
> table = pq.read_table(reader)
> schema = table.schema.remove(0).insert(0, new_field)
> new_table = table.cast(schema)
> assert new_table.schema == schema
> {code}
>
> Output
> {code:java}
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> F0318 09:55:14.266649 299722176 table.cc:47] Check failed: (chunks.size()) >
> (0) cannot construct ChunkedArray from empty vector and omitted type {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)