I didn’t say to disable _reading_ them, only writing them. On Mon, Jul 13, 2020 at 4:15 AM Antoine Pitrou <anto...@python.org> wrote:
> > I'm not sure that's a good idea. There are probably Parquet files that > are only ever used with the Arrow implementation (Arrow C++, Arrow > Python, Arrow R...). > > I admit I'm also not terribly bothered about this, since the Parquet > community itself doesn't seem to care much about the issue (it has been > known for a long time and they could have solved it long ago). > > Regards > > Antoine. > > > Le 13/07/2020 à 00:11, Wes McKinney a écrit : > > Since there hasn't been other movement on this, we need to disable > > writing LZ4-compressed files until this can be investigated more > > thoroughly. If someone wants to submit a patch that would be helpful > > otherwise I can take a look in the next couple days > > > > On Thu, Jul 2, 2020 at 12:50 PM Antoine Pitrou <anto...@python.org> > wrote: > >> > >> > >> Well, it depends how important speed is, but LZ4 has extremely fast > >> decompression, even compared to Snappy: > >> https://github.com/lz4/lz4#benchmarks > >> > >> Regards > >> > >> Antoine. > >> > >> > >> Le 02/07/2020 à 19:47, Christian Hudon a écrit : > >>> At least for us, the advantages of Parquet are speed and > interoperability > >>> in the context of longer-term data storage, so I would tend to say > >>> "reasonably conservative". > >>> > >>> Le mer. 1 juill. 2020, à 09 h 32, Antoine Pitrou <solip...@pitrou.net> > a > >>> écrit : > >>> > >>>> > >>>> I don't have a sense of how conservative Parquet users generally are. > >>>> Is it worth adding a LZ4_FRAMED compression option in the Parquet > >>>> format, or would people just not use it? > >>>> > >>>> Regards > >>>> > >>>> Antoine. > >>>> > >>>> > >>>> On Tue, 30 Jun 2020 14:33:17 +0200 > >>>> "Uwe L. Korn" <uw...@xhochy.com> wrote: > >>>>> I'm also in favor of disabling support for now. Having to deal with > >>>> broken files or the detection of various incompatible implementations > in > >>>> the long-term will harm more than not supporting LZ4 for a while. > Snappy is > >>>> generally more used than LZ4 in this category as it has been available > >>>> since the inception of Parquet and thus should be considered as a > viable > >>>> alternative. > >>>>> > >>>>> Cheers > >>>>> Uwe > >>>>> > >>>>> On Mon, Jun 29, 2020, at 11:48 PM, Wes McKinney wrote: > >>>>>> On Thu, Jun 25, 2020 at 3:31 AM Antoine Pitrou <anto...@python.org> > >>>> wrote: > >>>>>>> > >>>>>>> > >>>>>>> Le 25/06/2020 à 00:02, Wes McKinney a écrit : > >>>>>>>> hi folks, > >>>>>>>> > >>>>>>>> (cross-posting to dev@arrow and dev@parquet since there are > >>>>>>>> stakeholders in both places) > >>>>>>>> > >>>>>>>> It seems there are still problems at least with the C++ > >>>> implementation > >>>>>>>> of LZ4 compression in Parquet files > >>>>>>>> > >>>>>>>> https://issues.apache.org/jira/browse/PARQUET-1241 > >>>>>>>> https://issues.apache.org/jira/browse/PARQUET-1878 > >>>>>>> > >>>>>>> I don't have any particular opinion on how to solve the LZ4 issue, > >>>> but > >>>>>>> I'd like to mention that LZ4 and ZStandard are the two most > efficient > >>>>>>> compression algorithms available, and they span different parts of > >>>> the > >>>>>>> speed/compression spectrum, so it would be a pity to disable one of > >>>> them. > >>>>>> > >>>>>> It's true, however I think it's worse to write LZ4-compressed files > >>>>>> that cannot be read by other Parquet implementations (if that's > what's > >>>>>> happening as I understand it?). If we are indeed shipping something > >>>>>> broken then we either should fix it or disable it until it can be > >>>>>> fixed. > >>>>>> > >>>>>>> Regards > >>>>>>> > >>>>>>> Antoine. > >>>>>> > >>>>> > >>>> > >>>> > >>>> > >>>> > >>> >