rok opened a new pull request, #241:
URL: https://github.com/apache/parquet-format/pull/241

   As proposed in https://github.com/apache/arrow/issues/34510 and on 
[ML](https://lists.apache.org/thread/khco6z9kd1spxlokrjxhyy83x9ogvtdm), 
[PARQUET-2474](https://issues.apache.org/jira/browse/PARQUET-2474).
   
   Arrow recently introduced 
[FixedShapeTensor](https://arrow.apache.org/docs/format/CanonicalExtensions.html#fixed-shape-tensor)
 and 
[VariableShapeTensor](https://arrow.apache.org/docs/format/CanonicalExtensions.html#variable-shape-tensor)
 canonical extension types that use FixedSizeList and StructArray(List, 
FixedSizeList) as storage respectfully. These are targeted at machine learning 
and scientific applications that deal with large datasets and would benefit 
from using Parquet as on disk storage.
   
   However currently FixedSizeList is stored as List in Parquet which adds 
significant conversion overhead when reading and writing as [discussed 
here](https://github.com/apache/arrow/issues/34510). It would therefore be 
beneficial to introduce a FIXED_SIZE_LIST logical type to Parquet.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to