Based on the in person sync, I took an action item to try to write a draft doc so we can come to a clear consensus on how to decide on new encodings/compression. I hope to have something to share next week but it will likely need further input from the community.
Thanks, Micah On Thu, Mar 20, 2025 at 3:33 AM Antoine Pitrou <anto...@python.org> wrote: > On Tue, 18 Mar 2025 19:08:04 +0100 > Alkis Evlogimenos > <alkis.evlogime...@databricks.com.INVALID> > wrote: > > At the end it boils down to which dataset you think is more > representative > > of the world data. > > This sentence does not even have a precise meaning. Data is plural, > there is no "representative" dataset. > > If someone tells you that the average animal on Earth is 2 millimeters > long, is that "representative" of the characteristics of mammals? > > In the end, the question is whether a new encoding brings enough > benefits in *some* cases to justify including it in Parquet. You may > care primarily about Databricks customers, but some people don't. This > is not a Databricks project. > > Regards > > Antoine. > > >