hi Alex, No, if you look at the implementation in https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/compression_lz4.cc#L32 it is not using the same LZ4 compression style that Hadoop is using; realistically we need to add a bunch of options to Lz4Codec to be able to select what we want (or add LZ4_FRAMED codec). I'll have to dig in my e-mail to find the prior thread
- Wes On Tue, Aug 7, 2018 at 11:45 AM, ALeX Wang <[email protected]> wrote: > Hi Wes, > > Are you talking about this ? > http://mail-archives.apache.org/mod_mbox/arrow-issues/201805.mbox/%[email protected]%3E > > I tried to compile with the latest arrow which contain this fix and still > encountered the corruption error. > > Also, we tried to read the file using pyparquet, and spark, did not work > either, > > Thanks, > Alex Wang, > > > On Tue, 7 Aug 2018 at 08:37, Wes McKinney <[email protected]> wrote: > >> hi Alex, >> >> I think there was an e-mail thread or JIRA about this, would have to >> dig it up. LZ4 compression was originally underspecified (has that >> been fixed) and we aren't using the correct compressor/decompressor >> options in parquet-cpp at the moment. If you have time to dig in and >> fix it, it would be much appreciated. Note that the LZ4 code lives in >> Apache Arrow >> >> - Wes >> >> On Tue, Aug 7, 2018 at 11:10 AM, ALeX Wang <[email protected]> wrote: >> > Hi, >> > >> > Would like to kindly confirm my observation, >> > >> > We use parquet-mr (java) to generate parquet file with LZ4 compression. >> To >> > do this we have to compile/install hadoop native library with provides >> LZ4 >> > codec. >> > >> > However, the generated parquet file, is not recognizable by >> parquet-cpp. I >> > encountered following error when using the `tools/parquet_reader` binary, >> > >> > ``` >> > Parquet error: Arrow error: IOError: Corrupt Lz4 compressed data. >> > ``` >> > >> > Further search online get me to this JIRA ticket: >> > https://issues.apache.org/jira/browse/HADOOP-12990 >> > >> > So, since hadoop LZ4 is incompatible with open source, parquet-mr lz4 is >> > not compatible with parquet-cpp? >> > >> > Thanks, >> > -- >> > Alex Wang, >> > Open vSwitch developer >> > > > -- > Alex Wang, > Open vSwitch developer
