I didn’t say to disable _reading_ them, only writing them.

On Mon, Jul 13, 2020 at 4:15 AM Antoine Pitrou <anto...@python.org> wrote:

>
> I'm not sure that's a good idea.  There are probably Parquet files that
> are only ever used with the Arrow implementation (Arrow C++, Arrow
> Python, Arrow R...).
>
> I admit I'm also not terribly bothered about this, since the Parquet
> community itself doesn't seem to care much about the issue (it has been
> known for a long time and they could have solved it long ago).
>
> Regards
>
> Antoine.
>
>
> Le 13/07/2020 à 00:11, Wes McKinney a écrit :
> > Since there hasn't been other movement on this, we need to disable
> > writing LZ4-compressed files until this can be investigated more
> > thoroughly. If someone wants to submit a patch that would be helpful
> > otherwise I can take a look in the next couple days
> >
> > On Thu, Jul 2, 2020 at 12:50 PM Antoine Pitrou <anto...@python.org>
> wrote:
> >>
> >>
> >> Well, it depends how important speed is, but LZ4 has extremely fast
> >> decompression, even compared to Snappy:
> >> https://github.com/lz4/lz4#benchmarks
> >>
> >> Regards
> >>
> >> Antoine.
> >>
> >>
> >> Le 02/07/2020 à 19:47, Christian Hudon a écrit :
> >>> At least for us, the advantages of Parquet are speed and
> interoperability
> >>> in the context of longer-term data storage, so I would tend to say
> >>> "reasonably conservative".
> >>>
> >>> Le mer. 1 juill. 2020, à 09 h 32, Antoine Pitrou <solip...@pitrou.net>
> a
> >>> écrit :
> >>>
> >>>>
> >>>> I don't have a sense of how conservative Parquet users generally are.
> >>>> Is it worth adding a LZ4_FRAMED compression option in the Parquet
> >>>> format, or would people just not use it?
> >>>>
> >>>> Regards
> >>>>
> >>>> Antoine.
> >>>>
> >>>>
> >>>> On Tue, 30 Jun 2020 14:33:17 +0200
> >>>> "Uwe L. Korn" <uw...@xhochy.com> wrote:
> >>>>> I'm also in favor of disabling support for now. Having to deal with
> >>>> broken files or the detection of various incompatible implementations
> in
> >>>> the long-term will harm more than not supporting LZ4 for a while.
> Snappy is
> >>>> generally more used than LZ4 in this category as it has been available
> >>>> since the inception of Parquet and thus should be considered as a
> viable
> >>>> alternative.
> >>>>>
> >>>>> Cheers
> >>>>> Uwe
> >>>>>
> >>>>> On Mon, Jun 29, 2020, at 11:48 PM, Wes McKinney wrote:
> >>>>>> On Thu, Jun 25, 2020 at 3:31 AM Antoine Pitrou <anto...@python.org>
> >>>> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> Le 25/06/2020 à 00:02, Wes McKinney a écrit :
> >>>>>>>> hi folks,
> >>>>>>>>
> >>>>>>>> (cross-posting to dev@arrow and dev@parquet since there are
> >>>>>>>> stakeholders in both places)
> >>>>>>>>
> >>>>>>>> It seems there are still problems at least with the C++
> >>>> implementation
> >>>>>>>> of LZ4 compression in Parquet files
> >>>>>>>>
> >>>>>>>> https://issues.apache.org/jira/browse/PARQUET-1241
> >>>>>>>> https://issues.apache.org/jira/browse/PARQUET-1878
> >>>>>>>
> >>>>>>> I don't have any particular opinion on how to solve the LZ4 issue,
> >>>> but
> >>>>>>> I'd like to mention that LZ4 and ZStandard are the two most
> efficient
> >>>>>>> compression algorithms available, and they span different parts of
> >>>> the
> >>>>>>> speed/compression spectrum, so it would be a pity to disable one of
> >>>> them.
> >>>>>>
> >>>>>> It's true, however I think it's worse to write LZ4-compressed files
> >>>>>> that cannot be read by other Parquet implementations (if that's
> what's
> >>>>>> happening as I understand it?). If we are indeed shipping something
> >>>>>> broken then we either should fix it or disable it until it can be
> >>>>>> fixed.
> >>>>>>
> >>>>>>> Regards
> >>>>>>>
> >>>>>>> Antoine.
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
>

Reply via email to