[ 
https://issues.apache.org/jira/browse/ARROW-16231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche updated ARROW-16231:
------------------------------------------
    Fix Version/s: 9.0.0

> [C++][Python] IPC failure for dictionary with extension type with struct 
> storage type
> -------------------------------------------------------------------------------------
>
>                 Key: ARROW-16231
>                 URL: https://issues.apache.org/jira/browse/ARROW-16231
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, Python
>            Reporter: Joris Van den Bossche
>            Priority: Major
>             Fix For: 9.0.0
>
>
> Report from [https://github.com/apache/arrow/issues/12899]
> Roundtripping through IPC/Feather using a dictionary type where the 
> dictionary is an extension type with a nested storage type fails. Writing 
> seems to work (but no idea if the written file is "correct", as trying to 
> read the schema gives an error), but reading it back fails with 
> {_}"ArrowInvalid: Ran out of field metadata, likely malformed"{_}.
> The original use case was from a pandas extension type (the pandas interval 
> dtype is mapped to an arrow extension type with a struct type as storage, and 
> in this case this interval type was further wrapped in a categorical 
> (dictionary) type). A pandas-based test that reproduces this (can be added 
> like this in {{{}test_feather.py{}}}):
> {code:python}
> @pytest.mark.pandas
> def test_dictionary_interval():
>     df = pd.DataFrame({'a': pd.cut(range(1, 10, 3), [-1, 5, 10])})
>     _check_pandas_roundtrip(df, version=2)
> {code}
> this gives:
> {code:java}
> $ pytest python/pyarrow/tests/test_feather.py::test_dictionary_interval
> ....
> ========================= FAILURES =================
> ____________ test_dictionary_interval _______________
> pyarrow/_feather.pyx:88: in pyarrow._feather.FeatherReader.read
> E   pyarrow.lib.ArrowInvalid: Ran out of field metadata, likely malformed
> E   ../src/arrow/ipc/reader.cc:266  GetFieldMetadata(field_index_++, out_)
> E   ../src/arrow/ipc/reader.cc:283  LoadCommon(type_id)
> E   ../src/arrow/ipc/reader.cc:324  Load(child_fields[i].get(), 
> parent->child_data[i].get())
> E   ../src/arrow/ipc/reader.cc:529  loader.Load(&field, column.get())
> E   ../src/arrow/ipc/reader.cc:1188  ReadRecordBatchInternal( 
> *message->metadata(), schema_, field_inclusion_mask_, context, reader.get())
> E   ../src/arrow/ipc/feather.cc:730  reader->ReadRecordBatch(i)
> pyarrow/error.pxi:100: ArrowInvalid
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to