[
https://issues.apache.org/jira/browse/ARROW-16231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524230#comment-17524230
]
Joris Van den Bossche commented on ARROW-16231:
-----------------------------------------------
If I try to recreate this with a pure-pyarrow example, I get a different error:
{code}
import pyarrow as pa
from pyarrow.tests.test_extension_type import MyStructType
struct_array = pa.StructArray.from_arrays(
[pa.array([0, 1], type="int64"), pa.array([1, 2], type="int64")],
names=["left", "right"])
mystruct_array = pa.ExtensionArray.from_storage(MyStructType(), struct_array)
dict_array = pa.DictionaryArray.from_arrays(pa.array([0, 1, 0]), mystruct_array)
# roundtrip through Feather
from pyarrow import feather
feather.write_feather(pa.table({'a': dict_array}),
"test_dict_ext_nested.feather")
feather.read_table("test_dict_ext_nested.feather")
{code}
gives
{code}
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-9-df8b416670f4> in <module>
----> 1 feather.read_table("test_dict_ext_nested.feather")
~/scipy/repos/arrow/python/pyarrow/feather.py in read_table(source, columns,
memory_map, use_threads)
242 table : pyarrow.Table
243 """
--> 244 reader = _feather.FeatherReader(
245 source, use_memory_map=memory_map, use_threads=use_threads)
246
~/scipy/repos/arrow/python/pyarrow/_feather.pyx in
pyarrow._feather.FeatherReader.__cinit__()
~/scipy/repos/arrow/python/pyarrow/error.pxi in
pyarrow.lib.pyarrow_internal_check_status()
~/scipy/repos/arrow/python/pyarrow/types.pxi in
pyarrow.lib.PyExtensionType.__arrow_ext_deserialize__()
TypeError: Expected storage type struct<left: int64, right: int64> but got
dictionary<values=struct<left: int64, right: int64>, indices=int64, ordered=0>
{code}
> [C++][Python] IPC failure for dictionary with extension type with struct
> storage type
> -------------------------------------------------------------------------------------
>
> Key: ARROW-16231
> URL: https://issues.apache.org/jira/browse/ARROW-16231
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++, Python
> Reporter: Joris Van den Bossche
> Priority: Major
>
> Report from [https://github.com/apache/arrow/issues/12899]
> Roundtripping through IPC/Feather using a dictionary type where the
> dictionary is an extension type with a nested storage type fails. Writing
> seems to work (but no idea if the written file is "correct", as trying to
> read the schema gives an error), but reading it back fails with
> {_}"ArrowInvalid: Ran out of field metadata, likely malformed"{_}.
> The original use case was from a pandas extension type (the pandas interval
> dtype is mapped to an arrow extension type with a struct type as storage, and
> in this case this interval type was further wrapped in a categorical
> (dictionary) type). A pandas-based test that reproduces this (can be added
> like this in {{{}test_feather.py{}}}):
> {code:python}
> @pytest.mark.pandas
> def test_dictionary_interval():
> df = pd.DataFrame({'a': pd.cut(range(1, 10, 3), [-1, 5, 10])})
> _check_pandas_roundtrip(df, version=2)
> {code}
> this gives:
> {code:java}
> $ pytest python/pyarrow/tests/test_feather.py::test_dictionary_interval
> ....
> ========================= FAILURES =================
> ____________ test_dictionary_interval _______________
> pyarrow/_feather.pyx:88: in pyarrow._feather.FeatherReader.read
> E pyarrow.lib.ArrowInvalid: Ran out of field metadata, likely malformed
> E ../src/arrow/ipc/reader.cc:266 GetFieldMetadata(field_index_++, out_)
> E ../src/arrow/ipc/reader.cc:283 LoadCommon(type_id)
> E ../src/arrow/ipc/reader.cc:324 Load(child_fields[i].get(),
> parent->child_data[i].get())
> E ../src/arrow/ipc/reader.cc:529 loader.Load(&field, column.get())
> E ../src/arrow/ipc/reader.cc:1188 ReadRecordBatchInternal(
> *message->metadata(), schema_, field_inclusion_mask_, context, reader.get())
> E ../src/arrow/ipc/feather.cc:730 reader->ReadRecordBatch(i)
> pyarrow/error.pxi:100: ArrowInvalid
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)