pitrou commented on pull request #168: URL: https://github.com/apache/parquet-format/pull/168#issuecomment-796687561
Ok, distilling here the feedback from Yann Collet (the author of LZ4): the frame format is beneficial as it provides a standard encoding for the compressed and uncompressed size of the data. This allows various tools to consume LZ4-compressed files without needing any sideband metadata about data sizes. My take: Parquet already encodes the compressed and uncompressed size separately, and the data blocks are embedded deep inside the Parquet format, so the benefits of the frame format wouldn't apply to us. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org