hi Alex,

No, if you look at the implementation in
https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/compression_lz4.cc#L32
it is not using the same LZ4 compression style that Hadoop is using;
realistically we need to add a bunch of options to Lz4Codec to be able
to select what we want (or add LZ4_FRAMED codec). I'll have to dig in
my e-mail to find the prior thread

- Wes

On Tue, Aug 7, 2018 at 11:45 AM, ALeX Wang <[email protected]> wrote:
> Hi Wes,
>
> Are you talking about this ?
> http://mail-archives.apache.org/mod_mbox/arrow-issues/201805.mbox/%[email protected]%3E
>
> I tried to compile with the latest arrow which contain this fix and still
> encountered the corruption error.
>
> Also, we tried to read the file using pyparquet, and spark, did not work
> either,
>
> Thanks,
> Alex Wang,
>
>
> On Tue, 7 Aug 2018 at 08:37, Wes McKinney <[email protected]> wrote:
>
>> hi Alex,
>>
>> I think there was an e-mail thread or JIRA about this, would have to
>> dig it up. LZ4 compression was originally underspecified (has that
>> been fixed) and we aren't using the correct compressor/decompressor
>> options in parquet-cpp at the moment. If you have time to dig in and
>> fix it, it would be much appreciated. Note that the LZ4 code lives in
>> Apache Arrow
>>
>> - Wes
>>
>> On Tue, Aug 7, 2018 at 11:10 AM, ALeX Wang <[email protected]> wrote:
>> > Hi,
>> >
>> > Would like to kindly confirm my observation,
>> >
>> > We use parquet-mr (java) to generate parquet file with LZ4 compression.
>> To
>> > do this we have to compile/install hadoop native library with provides
>> LZ4
>> > codec.
>> >
>> > However, the generated parquet file, is not recognizable by
>> parquet-cpp.  I
>> > encountered following error when using the `tools/parquet_reader` binary,
>> >
>> > ```
>> > Parquet error: Arrow error: IOError: Corrupt Lz4 compressed data.
>> > ```
>> >
>> > Further search online get me to this JIRA ticket:
>> > https://issues.apache.org/jira/browse/HADOOP-12990
>> >
>> > So, since hadoop LZ4 is incompatible with open source, parquet-mr lz4 is
>> > not compatible with parquet-cpp?
>> >
>> > Thanks,
>> > --
>> > Alex Wang,
>> > Open vSwitch developer
>>
>
>
> --
> Alex Wang,
> Open vSwitch developer

Reply via email to