Re: Pitch for Pcodec in Parquet (again)

Micah Kornfield Tue, 01 Apr 2025 23:48:31 -0700

Apologies I been delayed in drafting this, should have something by end of
this week to share


On Thursday, March 20, 2025, Micah Kornfield <[email protected]> wrote:

> Based on the in person sync, I took an action item to try to write a draft
> doc so we can come to a clear consensus on how to decide on new
> encodings/compression.  I hope to have something to share next week but it
> will likely need further input from the community.
>
> Thanks,
> Micah
>
> On Thu, Mar 20, 2025 at 3:33 AM Antoine Pitrou <[email protected]> wrote:
>
>> On Tue, 18 Mar 2025 19:08:04 +0100
>> Alkis Evlogimenos
>> <[email protected]>
>> wrote:
>> > At the end it boils down to which dataset you think is more
>> representative
>> > of the world data.
>>
>> This sentence does not even have a precise meaning. Data is plural,
>> there is no "representative" dataset.
>>
>> If someone tells you that the average animal on Earth is 2 millimeters
>> long, is that "representative" of the characteristics of mammals?
>>
>> In the end, the question is whether a new encoding brings enough
>> benefits in *some* cases to justify including it in Parquet. You may
>> care primarily about Databricks customers, but some people don't. This
>> is not a Databricks project.
>>
>> Regards
>>
>> Antoine.
>>
>>
>>

Re: Pitch for Pcodec in Parquet (again)

Reply via email to