Since there hasn't been other movement on this, we need to disable
writing LZ4-compressed files until this can be investigated more
thoroughly. If someone wants to submit a patch that would be helpful
otherwise I can take a look in the next couple days

On Thu, Jul 2, 2020 at 12:50 PM Antoine Pitrou <anto...@python.org> wrote:
>
>
> Well, it depends how important speed is, but LZ4 has extremely fast
> decompression, even compared to Snappy:
> https://github.com/lz4/lz4#benchmarks
>
> Regards
>
> Antoine.
>
>
> Le 02/07/2020 à 19:47, Christian Hudon a écrit :
> > At least for us, the advantages of Parquet are speed and interoperability
> > in the context of longer-term data storage, so I would tend to say
> > "reasonably conservative".
> >
> > Le mer. 1 juill. 2020, à 09 h 32, Antoine Pitrou <solip...@pitrou.net> a
> > écrit :
> >
> >>
> >> I don't have a sense of how conservative Parquet users generally are.
> >> Is it worth adding a LZ4_FRAMED compression option in the Parquet
> >> format, or would people just not use it?
> >>
> >> Regards
> >>
> >> Antoine.
> >>
> >>
> >> On Tue, 30 Jun 2020 14:33:17 +0200
> >> "Uwe L. Korn" <uw...@xhochy.com> wrote:
> >>> I'm also in favor of disabling support for now. Having to deal with
> >> broken files or the detection of various incompatible implementations in
> >> the long-term will harm more than not supporting LZ4 for a while. Snappy is
> >> generally more used than LZ4 in this category as it has been available
> >> since the inception of Parquet and thus should be considered as a viable
> >> alternative.
> >>>
> >>> Cheers
> >>> Uwe
> >>>
> >>> On Mon, Jun 29, 2020, at 11:48 PM, Wes McKinney wrote:
> >>>> On Thu, Jun 25, 2020 at 3:31 AM Antoine Pitrou <anto...@python.org>
> >> wrote:
> >>>>>
> >>>>>
> >>>>> Le 25/06/2020 à 00:02, Wes McKinney a écrit :
> >>>>>> hi folks,
> >>>>>>
> >>>>>> (cross-posting to dev@arrow and dev@parquet since there are
> >>>>>> stakeholders in both places)
> >>>>>>
> >>>>>> It seems there are still problems at least with the C++
> >> implementation
> >>>>>> of LZ4 compression in Parquet files
> >>>>>>
> >>>>>> https://issues.apache.org/jira/browse/PARQUET-1241
> >>>>>> https://issues.apache.org/jira/browse/PARQUET-1878
> >>>>>
> >>>>> I don't have any particular opinion on how to solve the LZ4 issue,
> >> but
> >>>>> I'd like to mention that LZ4 and ZStandard are the two most efficient
> >>>>> compression algorithms available, and they span different parts of
> >> the
> >>>>> speed/compression spectrum, so it would be a pity to disable one of
> >> them.
> >>>>
> >>>> It's true, however I think it's worse to write LZ4-compressed files
> >>>> that cannot be read by other Parquet implementations (if that's what's
> >>>> happening as I understand it?). If we are indeed shipping something
> >>>> broken then we either should fix it or disable it until it can be
> >>>> fixed.
> >>>>
> >>>>> Regards
> >>>>>
> >>>>> Antoine.
> >>>>
> >>>
> >>
> >>
> >>
> >>
> >

Reply via email to