Rok Mihevc created PARQUET-2474:
-----------------------------------
Summary: [Format] Specify FIXED_SIZE_LIST Logical type
Key: PARQUET-2474
URL: https://issues.apache.org/jira/browse/PARQUET-2474
Project: Parquet
Issue Type: New Feature
Components: parquet-format
Reporter: Rok Mihevc
[_Replicated from mailing
list_|https://lists.apache.org/thread/xot5f3ghhtc82n1bf0wdl9zqwlrzqks3]
Arrow recently introduced
[FixedShapeTensor|https://arrow.apache.org/docs/format/CanonicalExtensions.html#fixed-shape-tensor]
and
[VariableShapeTensor|https://arrow.apache.org/docs/format/CanonicalExtensions.html#variable-shape-tensor]
canonical extension types that use FixedSizeList and StructArray(List,
FixedSizeList) as storage respectfully. These are targeted at machine learning
and scientific applications that deal with large datasets and would benefit
from using Parquet as on disk storage.
However currently FixedSizeList is stored as List in Parquet which adds
significant conversion overhead when reading and writing as [discussed
here|https://github.com/apache/arrow/issues/34510]. It would therefore be
beneficial to introduce a FIXED_SIZE_LIST logical type to Parquet.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]