[ https://issues.apache.org/jira/browse/PARQUET-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17299516#comment-17299516 ]
ASF GitHub Bot commented on PARQUET-1996: ----------------------------------------- pitrou commented on pull request #168: URL: https://github.com/apache/parquet-format/pull/168#issuecomment-796687561 Ok, distilling here the feedback from Yann Collet (the author of LZ4): the frame format is beneficial as it provides a standard encoding for the compressed and uncompressed size of the data. This allows various tools to consume LZ4-compressed files without needing any sideband metadata about data sizes. My take: Parquet already encodes the compressed and uncompressed size separately, and the data blocks are embedded deep inside the Parquet format, so the benefits of the frame format wouldn't apply to us. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Format] Add interoperable LZ4 codec, deprecate existing LZ4 codec > ------------------------------------------------------------------ > > Key: PARQUET-1996 > URL: https://issues.apache.org/jira/browse/PARQUET-1996 > Project: Parquet > Issue Type: Improvement > Components: parquet-format > Reporter: Antoine Pitrou > Priority: Major > > The current LZ4 codec is non-interoperable for reasons explained in details > on the parquet-dev mailing-list: > https://mail-archives.apache.org/mod_mbox/parquet-dev/202102.mbox/%3c20210216151401.7647ce37@fsol%3e -- This message was sent by Atlassian Jira (v8.3.4#803005)