[GitHub] [arrow] chairmank commented on pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

GitBox Fri, 17 Jul 2020 11:37:14 -0700


chairmank commented on pull request #7789:
URL: https://github.com/apache/arrow/pull/7789#issuecomment-660275200



   What do we think about making the Decompress method of the new hadoop lz4 
codec fall back to alternate implementation if it fails to decompress? Then the 
new hadoop lz4 codec could be used unconditionally, without trying to guess 
Parquet writer version from the file metadata.
   
   There would be a performance cost when attempting to read data pages that 
were written with incompatible lz4 codec. But this may be acceptable, because
   
   * after this change, only a minority of Parquet files will have this 
incompatible lz4ccompression
   * lz4 tends to error quickly when it tries and fails to read the first 
sequence 
(https://github.com/lz4/lz4/blob/dev/doc/lz4_Block_format.md#compressed-block-format)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] chairmank commented on pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

Reply via email to