Agreed, but even then, if some Parquet files are generated inside of a well-defined system which only needs to be interoperable with itself, it's not necessaril harmful to allow LZ4 compression when writing new files.
Regards Antoine. Le 13/07/2020 à 17:07, Wes McKinney a écrit : > I didn’t say to disable _reading_ them, only writing them. > > On Mon, Jul 13, 2020 at 4:15 AM Antoine Pitrou <anto...@python.org> wrote: > >> >> I'm not sure that's a good idea. There are probably Parquet files that >> are only ever used with the Arrow implementation (Arrow C++, Arrow >> Python, Arrow R...). >> >> I admit I'm also not terribly bothered about this, since the Parquet >> community itself doesn't seem to care much about the issue (it has been >> known for a long time and they could have solved it long ago). >> >> Regards >> >> Antoine. >> >> >> Le 13/07/2020 à 00:11, Wes McKinney a écrit : >>> Since there hasn't been other movement on this, we need to disable >>> writing LZ4-compressed files until this can be investigated more >>> thoroughly. If someone wants to submit a patch that would be helpful >>> otherwise I can take a look in the next couple days >>> >>> On Thu, Jul 2, 2020 at 12:50 PM Antoine Pitrou <anto...@python.org> >> wrote: >>>> >>>> >>>> Well, it depends how important speed is, but LZ4 has extremely fast >>>> decompression, even compared to Snappy: >>>> https://github.com/lz4/lz4#benchmarks >>>> >>>> Regards >>>> >>>> Antoine. >>>> >>>> >>>> Le 02/07/2020 à 19:47, Christian Hudon a écrit : >>>>> At least for us, the advantages of Parquet are speed and >> interoperability >>>>> in the context of longer-term data storage, so I would tend to say >>>>> "reasonably conservative". >>>>> >>>>> Le mer. 1 juill. 2020, à 09 h 32, Antoine Pitrou <solip...@pitrou.net> >> a >>>>> écrit : >>>>> >>>>>> >>>>>> I don't have a sense of how conservative Parquet users generally are. >>>>>> Is it worth adding a LZ4_FRAMED compression option in the Parquet >>>>>> format, or would people just not use it? >>>>>> >>>>>> Regards >>>>>> >>>>>> Antoine. >>>>>> >>>>>> >>>>>> On Tue, 30 Jun 2020 14:33:17 +0200 >>>>>> "Uwe L. Korn" <uw...@xhochy.com> wrote: >>>>>>> I'm also in favor of disabling support for now. Having to deal with >>>>>> broken files or the detection of various incompatible implementations >> in >>>>>> the long-term will harm more than not supporting LZ4 for a while. >> Snappy is >>>>>> generally more used than LZ4 in this category as it has been available >>>>>> since the inception of Parquet and thus should be considered as a >> viable >>>>>> alternative. >>>>>>> >>>>>>> Cheers >>>>>>> Uwe >>>>>>> >>>>>>> On Mon, Jun 29, 2020, at 11:48 PM, Wes McKinney wrote: >>>>>>>> On Thu, Jun 25, 2020 at 3:31 AM Antoine Pitrou <anto...@python.org> >>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Le 25/06/2020 à 00:02, Wes McKinney a écrit : >>>>>>>>>> hi folks, >>>>>>>>>> >>>>>>>>>> (cross-posting to dev@arrow and dev@parquet since there are >>>>>>>>>> stakeholders in both places) >>>>>>>>>> >>>>>>>>>> It seems there are still problems at least with the C++ >>>>>> implementation >>>>>>>>>> of LZ4 compression in Parquet files >>>>>>>>>> >>>>>>>>>> https://issues.apache.org/jira/browse/PARQUET-1241 >>>>>>>>>> https://issues.apache.org/jira/browse/PARQUET-1878 >>>>>>>>> >>>>>>>>> I don't have any particular opinion on how to solve the LZ4 issue, >>>>>> but >>>>>>>>> I'd like to mention that LZ4 and ZStandard are the two most >> efficient >>>>>>>>> compression algorithms available, and they span different parts of >>>>>> the >>>>>>>>> speed/compression spectrum, so it would be a pity to disable one of >>>>>> them. >>>>>>>> >>>>>>>> It's true, however I think it's worse to write LZ4-compressed files >>>>>>>> that cannot be read by other Parquet implementations (if that's >> what's >>>>>>>> happening as I understand it?). If we are indeed shipping something >>>>>>>> broken then we either should fix it or disable it until it can be >>>>>>>> fixed. >>>>>>>> >>>>>>>>> Regards >>>>>>>>> >>>>>>>>> Antoine. >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >> >