rok commented on issue #34510: URL: https://github.com/apache/arrow/issues/34510#issuecomment-2109768275
> What if applications would use custom metadata to hold the schema and tensor type while writing only storage values (floats for example) in parquet files? It would need some custom logic to construct the tensor again when reading but might be a good alternative (buffers should still be the same after read, not copied). I think `FixedShapeTensor` get stored as `FixedSizeList` plus some metadata so overhead comes from storing `FixedSizeList`. I'm not sure, but maybe there's a clean way to have `FixedSizeList` cast to `FixedSizeBinary` or similar when writing `FixedShapeTensor` and then the inverse on reading. I don't think we have a clean option here though. Given the current activity in Parquet community it might be worth proposing adding `FixedSizeList` to Parquet? Also I wonder if optimized take (https://github.com/apache/arrow/issues/39798) would improve the performance somewhat once all the PRs land. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
