Based on the in person sync, I took an action item to try to write a draft
doc so we can come to a clear consensus on how to decide on new
encodings/compression.  I hope to have something to share next week but it
will likely need further input from the community.

Thanks,
Micah

On Thu, Mar 20, 2025 at 3:33 AM Antoine Pitrou <anto...@python.org> wrote:

> On Tue, 18 Mar 2025 19:08:04 +0100
> Alkis Evlogimenos
> <alkis.evlogime...@databricks.com.INVALID>
> wrote:
> > At the end it boils down to which dataset you think is more
> representative
> > of the world data.
>
> This sentence does not even have a precise meaning. Data is plural,
> there is no "representative" dataset.
>
> If someone tells you that the average animal on Earth is 2 millimeters
> long, is that "representative" of the characteristics of mammals?
>
> In the end, the question is whether a new encoding brings enough
> benefits in *some* cases to justify including it in Parquet. You may
> care primarily about Databricks customers, but some people don't. This
> is not a Databricks project.
>
> Regards
>
> Antoine.
>
>
>

Reply via email to