patrickpai commented on pull request #7789: URL: https://github.com/apache/arrow/pull/7789#issuecomment-660257971
> However, on the read side, you must ideally be able to ingest both kinds of input (Hadoop and non-Hadoop LZ4), so as to be maximally compatible with existing files. Do you know if there is a 100% sure way to differentiate between a parquet file written with Hadoop LZ4 and a parquet file written with non-Hadoop LZ4? My understanding is that we can read the first 8 bytes, and see if they make sense for the remaining part of the buffer. However, I don't know if this is guaranteed to work 100% of the time. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
