pmarks opened a new issue, #8826:
URL: https://github.com/apache/arrow-rs/issues/8826
I have a collection of parquet files that started throwing
ParquetError::General("Unexpected list/set element type0") when parsing
FileMetaData. They hit [this error
path](https://github.com/apache/arrow-rs/blob/5a1a13a7b39cef7ee71011a1f42f11338e6acd5d/parquet/src/parquet_thrift.rs#L225),
due to having an ElementType of 0, and a length of 0 in the
FileMetaData.row_groups.columns.meta_data.key_value_metadata list header. In
the spec, an
[empty-map](https://github.com/apache/thrift/blob/master/doc/specs/thrift-compact-protocol.md#map)
with an element-type of 0 is allowed for maps, but doesn't seem to be
explicitly allowed for lists. So perhaps these files are technically out of
spec, but I haven't yet encountered a reader that rejected them, and I have
used a variety of tools with these files. The parquet files were created with
"fastparquet-python version 2024.2.0 (build 0)".
I don't have an easy way to share the filea - one can be downloaded
[here](https://cf.10xgenomics.com/samples/xenium/1.5.0/Xenium_V1_hPancreas_nondiseased_section/Xenium_V1_hPancreas_nondiseased_section_outs.zip)
- *NOTE: 6GB download* - get the `transcripts.parquet` file out of the zip
archive.
I bisected this crash to #8530. @etseidl
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]