Apologies I been delayed in drafting this, should have something by end of this week to share
On Thursday, March 20, 2025, Micah Kornfield <emkornfi...@gmail.com> wrote: > Based on the in person sync, I took an action item to try to write a draft > doc so we can come to a clear consensus on how to decide on new > encodings/compression. I hope to have something to share next week but it > will likely need further input from the community. > > Thanks, > Micah > > On Thu, Mar 20, 2025 at 3:33 AM Antoine Pitrou <anto...@python.org> wrote: > >> On Tue, 18 Mar 2025 19:08:04 +0100 >> Alkis Evlogimenos >> <alkis.evlogime...@databricks.com.INVALID> >> wrote: >> > At the end it boils down to which dataset you think is more >> representative >> > of the world data. >> >> This sentence does not even have a precise meaning. Data is plural, >> there is no "representative" dataset. >> >> If someone tells you that the average animal on Earth is 2 millimeters >> long, is that "representative" of the characteristics of mammals? >> >> In the end, the question is whether a new encoding brings enough >> benefits in *some* cases to justify including it in Parquet. You may >> care primarily about Databricks customers, but some people don't. This >> is not a Databricks project. >> >> Regards >> >> Antoine. >> >> >>