hi Alex, I think there was an e-mail thread or JIRA about this, would have to dig it up. LZ4 compression was originally underspecified (has that been fixed) and we aren't using the correct compressor/decompressor options in parquet-cpp at the moment. If you have time to dig in and fix it, it would be much appreciated. Note that the LZ4 code lives in Apache Arrow
- Wes On Tue, Aug 7, 2018 at 11:10 AM, ALeX Wang <ee07b...@gmail.com> wrote: > Hi, > > Would like to kindly confirm my observation, > > We use parquet-mr (java) to generate parquet file with LZ4 compression. To > do this we have to compile/install hadoop native library with provides LZ4 > codec. > > However, the generated parquet file, is not recognizable by parquet-cpp. I > encountered following error when using the `tools/parquet_reader` binary, > > ``` > Parquet error: Arrow error: IOError: Corrupt Lz4 compressed data. > ``` > > Further search online get me to this JIRA ticket: > https://issues.apache.org/jira/browse/HADOOP-12990 > > So, since hadoop LZ4 is incompatible with open source, parquet-mr lz4 is > not compatible with parquet-cpp? > > Thanks, > -- > Alex Wang, > Open vSwitch developer