Re: hadoop LZ4 incompatible with open source LZ4

ALeX Wang Tue, 07 Aug 2018 08:46:29 -0700

Hi Wes,

Are you talking about this ?
http://mail-archives.apache.org/mod_mbox/arrow-issues/201805.mbox/%[email protected]%3E


I tried to compile with the latest arrow which contain this fix and still
encountered the corruption error.

Also, we tried to read the file using pyparquet, and spark, did not work
either,

Thanks,
Alex Wang,


On Tue, 7 Aug 2018 at 08:37, Wes McKinney <[email protected]> wrote:

> hi Alex,
>
> I think there was an e-mail thread or JIRA about this, would have to
> dig it up. LZ4 compression was originally underspecified (has that
> been fixed) and we aren't using the correct compressor/decompressor
> options in parquet-cpp at the moment. If you have time to dig in and
> fix it, it would be much appreciated. Note that the LZ4 code lives in
> Apache Arrow
>
> - Wes
>
> On Tue, Aug 7, 2018 at 11:10 AM, ALeX Wang <[email protected]> wrote:
> > Hi,
> >
> > Would like to kindly confirm my observation,
> >
> > We use parquet-mr (java) to generate parquet file with LZ4 compression.
> To
> > do this we have to compile/install hadoop native library with provides
> LZ4
> > codec.
> >
> > However, the generated parquet file, is not recognizable by
> parquet-cpp.  I
> > encountered following error when using the `tools/parquet_reader` binary,
> >
> > ```
> > Parquet error: Arrow error: IOError: Corrupt Lz4 compressed data.
> > ```
> >
> > Further search online get me to this JIRA ticket:
> > https://issues.apache.org/jira/browse/HADOOP-12990
> >
> > So, since hadoop LZ4 is incompatible with open source, parquet-mr lz4 is
> > not compatible with parquet-cpp?
> >
> > Thanks,
> > --
> > Alex Wang,
> > Open vSwitch developer
>


-- 
Alex Wang,
Open vSwitch developer

Re: hadoop LZ4 incompatible with open source LZ4

Reply via email to