hi Alex,

I think there was an e-mail thread or JIRA about this, would have to
dig it up. LZ4 compression was originally underspecified (has that
been fixed) and we aren't using the correct compressor/decompressor
options in parquet-cpp at the moment. If you have time to dig in and
fix it, it would be much appreciated. Note that the LZ4 code lives in
Apache Arrow

- Wes

On Tue, Aug 7, 2018 at 11:10 AM, ALeX Wang <ee07b...@gmail.com> wrote:
> Hi,
>
> Would like to kindly confirm my observation,
>
> We use parquet-mr (java) to generate parquet file with LZ4 compression.  To
> do this we have to compile/install hadoop native library with provides LZ4
> codec.
>
> However, the generated parquet file, is not recognizable by parquet-cpp.  I
> encountered following error when using the `tools/parquet_reader` binary,
>
> ```
> Parquet error: Arrow error: IOError: Corrupt Lz4 compressed data.
> ```
>
> Further search online get me to this JIRA ticket:
> https://issues.apache.org/jira/browse/HADOOP-12990
>
> So, since hadoop LZ4 is incompatible with open source, parquet-mr lz4 is
> not compatible with parquet-cpp?
>
> Thanks,
> --
> Alex Wang,
> Open vSwitch developer

Reply via email to