[GitHub] [arrow] patrickpai commented on pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

GitBox Fri, 17 Jul 2020 11:00:24 -0700


patrickpai commented on pull request #7789:
URL: https://github.com/apache/arrow/pull/7789#issuecomment-660257971



   > However, on the read side, you must ideally be able to ingest both kinds 
of input (Hadoop and non-Hadoop LZ4), so as to be maximally compatible with 
existing files.
   
   Do you know if there is a 100% sure way to differentiate between a parquet 
file written with Hadoop LZ4 and a parquet file written with non-Hadoop LZ4? My 
understanding is that we can read the first 8 bytes, and see if they make sense 
for the remaining part of the buffer. However, I don't know if this is 
guaranteed to work 100% of the time.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] patrickpai commented on pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

Reply via email to