Rok Mihevc created PARQUET-2474:
-----------------------------------

             Summary: [Format] Specify FIXED_SIZE_LIST Logical type
                 Key: PARQUET-2474
                 URL: https://issues.apache.org/jira/browse/PARQUET-2474
             Project: Parquet
          Issue Type: New Feature
          Components: parquet-format
            Reporter: Rok Mihevc


[_Replicated from mailing 
list_|https://lists.apache.org/thread/xot5f3ghhtc82n1bf0wdl9zqwlrzqks3]

Arrow recently introduced 
[FixedShapeTensor|https://arrow.apache.org/docs/format/CanonicalExtensions.html#fixed-shape-tensor]
 and 
[VariableShapeTensor|https://arrow.apache.org/docs/format/CanonicalExtensions.html#variable-shape-tensor]
 canonical extension types that use FixedSizeList and StructArray(List, 
FixedSizeList) as storage respectfully. These are targeted at machine learning 
and scientific applications that deal with large datasets and would benefit 
from using Parquet as on disk storage.

However currently FixedSizeList is stored as List in Parquet which adds 
significant conversion overhead when reading and writing as [discussed 
here|https://github.com/apache/arrow/issues/34510]. It would therefore be 
beneficial to introduce a FIXED_SIZE_LIST logical type to Parquet.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to