I'll volunteer to disable writing/reading LZ4. I'll submit a patch in the next few days.
On 2020/07/12 22:11:33, Wes McKinney <wesmck...@gmail.com> wrote: > Since there hasn't been other movement on this, we need to disable > writing LZ4-compressed files until this can be investigated more > thoroughly. If someone wants to submit a patch that would be helpful > otherwise I can take a look in the next couple days > > On Thu, Jul 2, 2020 at 12:50 PM Antoine Pitrou <anto...@python.org> wrote: > > > > > > Well, it depends how important speed is, but LZ4 has extremely fast > > decompression, even compared to Snappy: > > https://github.com/lz4/lz4#benchmarks > > > > Regards > > > > Antoine. > > > > > > Le 02/07/2020 à 19:47, Christian Hudon a écrit : > > > At least for us, the advantages of Parquet are speed and interoperability > > > in the context of longer-term data storage, so I would tend to say > > > "reasonably conservative". > > > > > > Le mer. 1 juill. 2020, à 09 h 32, Antoine Pitrou <solip...@pitrou.net> a > > > écrit : > > > > > >> > > >> I don't have a sense of how conservative Parquet users generally are. > > >> Is it worth adding a LZ4_FRAMED compression option in the Parquet > > >> format, or would people just not use it? > > >> > > >> Regards > > >> > > >> Antoine. > > >> > > >> > > >> On Tue, 30 Jun 2020 14:33:17 +0200 > > >> "Uwe L. Korn" <uw...@xhochy.com> wrote: > > >>> I'm also in favor of disabling support for now. Having to deal with > > >> broken files or the detection of various incompatible implementations in > > >> the long-term will harm more than not supporting LZ4 for a while. Snappy > > >> is > > >> generally more used than LZ4 in this category as it has been available > > >> since the inception of Parquet and thus should be considered as a viable > > >> alternative. > > >>> > > >>> Cheers > > >>> Uwe > > >>> > > >>> On Mon, Jun 29, 2020, at 11:48 PM, Wes McKinney wrote: > > >>>> On Thu, Jun 25, 2020 at 3:31 AM Antoine Pitrou <anto...@python.org> > > >> wrote: > > >>>>> > > >>>>> > > >>>>> Le 25/06/2020 à 00:02, Wes McKinney a écrit : > > >>>>>> hi folks, > > >>>>>> > > >>>>>> (cross-posting to dev@arrow and dev@parquet since there are > > >>>>>> stakeholders in both places) > > >>>>>> > > >>>>>> It seems there are still problems at least with the C++ > > >> implementation > > >>>>>> of LZ4 compression in Parquet files > > >>>>>> > > >>>>>> https://issues.apache.org/jira/browse/PARQUET-1241 > > >>>>>> https://issues.apache.org/jira/browse/PARQUET-1878 > > >>>>> > > >>>>> I don't have any particular opinion on how to solve the LZ4 issue, > > >> but > > >>>>> I'd like to mention that LZ4 and ZStandard are the two most efficient > > >>>>> compression algorithms available, and they span different parts of > > >> the > > >>>>> speed/compression spectrum, so it would be a pity to disable one of > > >> them. > > >>>> > > >>>> It's true, however I think it's worse to write LZ4-compressed files > > >>>> that cannot be read by other Parquet implementations (if that's what's > > >>>> happening as I understand it?). If we are indeed shipping something > > >>>> broken then we either should fix it or disable it until it can be > > >>>> fixed. > > >>>> > > >>>>> Regards > > >>>>> > > >>>>> Antoine. > > >>>> > > >>> > > >> > > >> > > >> > > >> > > > >