[GitHub] [arrow] patrickpai commented on pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

GitBox Fri, 17 Jul 2020 14:16:25 -0700


patrickpai commented on pull request #7789:
URL: https://github.com/apache/arrow/pull/7789#issuecomment-660341488



   @pitrou Feel free to take a look! Note that in the most recent commit, if we 
try to decompress a parquet file written using Hadoop Lz4Codec but the file is 
corrupted (past the initial 8 bytes), then it might be possible for the fall 
back to succeed (though unlikely).
   
   If we prefer 100% correctness for parquet files written using Hadoop 
Lz4Codec, then we might want to use 
https://github.com/apache/arrow/pull/7789/commits/7b94333f8432d54d08f6924965d51b2142158fa4
 instead
   
   The one change that's missing is to re-enable writing files with lz4 
compression. Any thoughts on whether that should be part of this PR or if I 
should make another PR? I know the 1.0.0 release is going on right now, so I'm 
thinking it might be a while until we're sure the release is stable before 
re-enabling writing with lz4.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] patrickpai commented on pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

Reply via email to