I'm not sure that's a good idea. There are probably Parquet files that are only ever used with the Arrow implementation (Arrow C++, Arrow Python, Arrow R...).
I admit I'm also not terribly bothered about this, since the Parquet community itself doesn't seem to care much about the issue (it has been known for a long time and they could have solved it long ago). Regards Antoine. Le 13/07/2020 à 00:11, Wes McKinney a écrit : > Since there hasn't been other movement on this, we need to disable > writing LZ4-compressed files until this can be investigated more > thoroughly. If someone wants to submit a patch that would be helpful > otherwise I can take a look in the next couple days > > On Thu, Jul 2, 2020 at 12:50 PM Antoine Pitrou <anto...@python.org> wrote: >> >> >> Well, it depends how important speed is, but LZ4 has extremely fast >> decompression, even compared to Snappy: >> https://github.com/lz4/lz4#benchmarks >> >> Regards >> >> Antoine. >> >> >> Le 02/07/2020 à 19:47, Christian Hudon a écrit : >>> At least for us, the advantages of Parquet are speed and interoperability >>> in the context of longer-term data storage, so I would tend to say >>> "reasonably conservative". >>> >>> Le mer. 1 juill. 2020, à 09 h 32, Antoine Pitrou <solip...@pitrou.net> a >>> écrit : >>> >>>> >>>> I don't have a sense of how conservative Parquet users generally are. >>>> Is it worth adding a LZ4_FRAMED compression option in the Parquet >>>> format, or would people just not use it? >>>> >>>> Regards >>>> >>>> Antoine. >>>> >>>> >>>> On Tue, 30 Jun 2020 14:33:17 +0200 >>>> "Uwe L. Korn" <uw...@xhochy.com> wrote: >>>>> I'm also in favor of disabling support for now. Having to deal with >>>> broken files or the detection of various incompatible implementations in >>>> the long-term will harm more than not supporting LZ4 for a while. Snappy is >>>> generally more used than LZ4 in this category as it has been available >>>> since the inception of Parquet and thus should be considered as a viable >>>> alternative. >>>>> >>>>> Cheers >>>>> Uwe >>>>> >>>>> On Mon, Jun 29, 2020, at 11:48 PM, Wes McKinney wrote: >>>>>> On Thu, Jun 25, 2020 at 3:31 AM Antoine Pitrou <anto...@python.org> >>>> wrote: >>>>>>> >>>>>>> >>>>>>> Le 25/06/2020 à 00:02, Wes McKinney a écrit : >>>>>>>> hi folks, >>>>>>>> >>>>>>>> (cross-posting to dev@arrow and dev@parquet since there are >>>>>>>> stakeholders in both places) >>>>>>>> >>>>>>>> It seems there are still problems at least with the C++ >>>> implementation >>>>>>>> of LZ4 compression in Parquet files >>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/PARQUET-1241 >>>>>>>> https://issues.apache.org/jira/browse/PARQUET-1878 >>>>>>> >>>>>>> I don't have any particular opinion on how to solve the LZ4 issue, >>>> but >>>>>>> I'd like to mention that LZ4 and ZStandard are the two most efficient >>>>>>> compression algorithms available, and they span different parts of >>>> the >>>>>>> speed/compression spectrum, so it would be a pity to disable one of >>>> them. >>>>>> >>>>>> It's true, however I think it's worse to write LZ4-compressed files >>>>>> that cannot be read by other Parquet implementations (if that's what's >>>>>> happening as I understand it?). If we are indeed shipping something >>>>>> broken then we either should fix it or disable it until it can be >>>>>> fixed. >>>>>> >>>>>>> Regards >>>>>>> >>>>>>> Antoine. >>>>>> >>>>> >>>> >>>> >>>> >>>> >>>