I'm not sure that's a good idea.  There are probably Parquet files that
are only ever used with the Arrow implementation (Arrow C++, Arrow
Python, Arrow R...).

I admit I'm also not terribly bothered about this, since the Parquet
community itself doesn't seem to care much about the issue (it has been
known for a long time and they could have solved it long ago).

Regards

Antoine.


Le 13/07/2020 à 00:11, Wes McKinney a écrit :
> Since there hasn't been other movement on this, we need to disable
> writing LZ4-compressed files until this can be investigated more
> thoroughly. If someone wants to submit a patch that would be helpful
> otherwise I can take a look in the next couple days
> 
> On Thu, Jul 2, 2020 at 12:50 PM Antoine Pitrou <anto...@python.org> wrote:
>>
>>
>> Well, it depends how important speed is, but LZ4 has extremely fast
>> decompression, even compared to Snappy:
>> https://github.com/lz4/lz4#benchmarks
>>
>> Regards
>>
>> Antoine.
>>
>>
>> Le 02/07/2020 à 19:47, Christian Hudon a écrit :
>>> At least for us, the advantages of Parquet are speed and interoperability
>>> in the context of longer-term data storage, so I would tend to say
>>> "reasonably conservative".
>>>
>>> Le mer. 1 juill. 2020, à 09 h 32, Antoine Pitrou <solip...@pitrou.net> a
>>> écrit :
>>>
>>>>
>>>> I don't have a sense of how conservative Parquet users generally are.
>>>> Is it worth adding a LZ4_FRAMED compression option in the Parquet
>>>> format, or would people just not use it?
>>>>
>>>> Regards
>>>>
>>>> Antoine.
>>>>
>>>>
>>>> On Tue, 30 Jun 2020 14:33:17 +0200
>>>> "Uwe L. Korn" <uw...@xhochy.com> wrote:
>>>>> I'm also in favor of disabling support for now. Having to deal with
>>>> broken files or the detection of various incompatible implementations in
>>>> the long-term will harm more than not supporting LZ4 for a while. Snappy is
>>>> generally more used than LZ4 in this category as it has been available
>>>> since the inception of Parquet and thus should be considered as a viable
>>>> alternative.
>>>>>
>>>>> Cheers
>>>>> Uwe
>>>>>
>>>>> On Mon, Jun 29, 2020, at 11:48 PM, Wes McKinney wrote:
>>>>>> On Thu, Jun 25, 2020 at 3:31 AM Antoine Pitrou <anto...@python.org>
>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Le 25/06/2020 à 00:02, Wes McKinney a écrit :
>>>>>>>> hi folks,
>>>>>>>>
>>>>>>>> (cross-posting to dev@arrow and dev@parquet since there are
>>>>>>>> stakeholders in both places)
>>>>>>>>
>>>>>>>> It seems there are still problems at least with the C++
>>>> implementation
>>>>>>>> of LZ4 compression in Parquet files
>>>>>>>>
>>>>>>>> https://issues.apache.org/jira/browse/PARQUET-1241
>>>>>>>> https://issues.apache.org/jira/browse/PARQUET-1878
>>>>>>>
>>>>>>> I don't have any particular opinion on how to solve the LZ4 issue,
>>>> but
>>>>>>> I'd like to mention that LZ4 and ZStandard are the two most efficient
>>>>>>> compression algorithms available, and they span different parts of
>>>> the
>>>>>>> speed/compression spectrum, so it would be a pity to disable one of
>>>> them.
>>>>>>
>>>>>> It's true, however I think it's worse to write LZ4-compressed files
>>>>>> that cannot be read by other Parquet implementations (if that's what's
>>>>>> happening as I understand it?). If we are indeed shipping something
>>>>>> broken then we either should fix it or disable it until it can be
>>>>>> fixed.
>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> Antoine.
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>

Reply via email to