It sounds like everybody is happy with the proposal. Tomorrow is the Parquet sync, we can finalize then.
On Wed, Jul 24, 2024 at 9:20 AM Julien Le Dem <jul...@apache.org> wrote: > Hi Alkis, > I saw you addressed and resolved the comments in the doc. Thank you. > This looks good to me. > I would recommend others that have been active in this conversation to > take a final look. > Best > Julien > > On Tue, Jul 23, 2024 at 3:06 PM Julien Le Dem <jul...@apache.org> wrote: > >> I am also OK with the proposed solution in the document. >> However I think the doc itself needs one last wording change. >> I have left more details in comments but here is the gist: >> This effort is driven by a group of people in the community and not one >> vendor in particular even if said people do sometimes work for vendors. >> To reflect this, instead of saying the UUID identifies a Vendor, we >> should describe it as an extension ID. >> Then I'd remove all instances of the word "Vendor" and instead >> refer to "Extensions" identified by this UUID. >> This might not change anything to the implementation but it is important >> to reflecting how the community works in the document. >> >> Specifically: >> >> "Vendor introduces a Flatbuffers variant of FileMetaData." => "This >> extension introduces a Flatbuffers variant of FileMetaData..." >> >> "The UUID is picked by the Vendor once and used throughout the >> experiments." => "The UUID is picked for this specific extension and used >> throughout the experiments." >> >> "At some point Vendor decides that this is amazing and should be shared >> with the world at large to advance Parquet. " => "At some point, the >> community decides this extension is ready and proposed for inclusion." >> >> >> On Mon, Jul 22, 2024 at 10:11 PM Micah Kornfield <emkornfi...@gmail.com> >> wrote: >> >>> Hi Alkis, >>> Thanks for the revision. I'm OK with this as is, we can maybe wait a few >>> more days to see if anybody else has comments and then discuss >>> implementation of the extension mechanism? >>> >>> Cheers, >>> Micah >>> >>> On Thu, Jul 18, 2024 at 10:22 PM Alkis Evlogimenos >>> <alkis.evlogime...@databricks.com.invalid> wrote: >>> >>> > After Jul 17th's Parquet Sync feedback I have updated the extensions >>> > proposal to remove the "reservation" mechanism. The updates are already >>> > reflected in the document >>> > < >>> > >>> https://docs.google.com/document/d/1KkoR0DjzYnLQXO-d0oRBv2k157IZU0_injqd4eV4WiI/edit >>> > > >>> > and >>> > the PR <https://github.com/apache/parquet-format/pull/254>. >>> > >>> > On Fri, Jun 28, 2024 at 10:02 AM Alkis Evlogimenos < >>> > alkis.evlogime...@databricks.com> wrote: >>> > >>> > > > I think we can at least have wording to encourage people doing >>> > > extensions to post them publicly and as part of the "reservation" >>> > mechanism >>> > > post a link the repo that they are being developed in, if anyone is >>> > curious. >>> > > >>> > > Good point. I will try to come up with something in the PR - unless >>> you >>> > > beat me to it :) >>> > > >>> > > On Fri, Jun 28, 2024 at 7:15 AM Micah Kornfield < >>> emkornfi...@gmail.com> >>> > > wrote: >>> > > >>> > >> > >>> > >> > 1. experimentation/prototyping is more often than not faster to >>> > iterate >>> > >> if >>> > >> > it is closed. Allowing this model of development was a primary >>> goal of >>> > >> the >>> > >> > design. >>> > >> >>> > >> >>> > >> I agree there are advantages here. I think a large amount of speed >>> > comes >>> > >> from not having to gain consensus in the community. >>> > >> >>> > >> At the end of the day, I don't think there is any mechanism here to >>> > ensure >>> > >> everybody works in public, but I think we can at least have wording >>> to >>> > >> encourage people doing extensions to post them publicly and as part >>> of >>> > the >>> > >> "reservation" mechanism post a link the repo that they are being >>> > developed >>> > >> in, if anyone is curious. I think this would be particularly >>> useful if >>> > >> there really is an intent for a number of organizations to >>> experiment >>> > with >>> > >> new footer designs (but possibly also in others). >>> > >> >>> > >> Thanks, >>> > >> Micah >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> On Wed, Jun 26, 2024 at 9:33 AM Alkis Evlogimenos >>> > >> <alkis.evlogime...@databricks.com.invalid> wrote: >>> > >> >>> > >> > Thank you for taking a look Micah. >>> > >> > >>> > >> > On the topic of openness there are various aspects that we have >>> > >> considered. >>> > >> > 1. experimentation/prototyping is more often than not faster to >>> > iterate >>> > >> if >>> > >> > it is closed. Allowing this model of development was a primary >>> goal of >>> > >> the >>> > >> > design. >>> > >> > 2. when the design is final, keeping the design closed should have >>> > some >>> > >> > drawbacks. Duplicating content to support old readers puts some >>> > natural >>> > >> > incentive to make extensions official because at that point one >>> can >>> > drop >>> > >> > the fat from the files and move on. Another aspect of the design >>> is >>> > the >>> > >> > choice of a single extension field-id which makes the extension >>> space >>> > >> tiny. >>> > >> > This in turn means that it is difficult to interop with others >>> without >>> > >> > breaking their extensions. Ergo the easiest path to any interop >>> is to >>> > >> open >>> > >> > the extension. >>> > >> > >>> > >> > The above, while not enforcing work to happen in the open, strike >>> some >>> > >> > balance in between. >>> > >> > >>> > >> > I am open to suggestions on how to further incentivize opening >>> > >> extensions. >>> > >> > >>> > >> > On Wed, Jun 26, 2024 at 6:04 PM Micah Kornfield < >>> > emkornfi...@gmail.com> >>> > >> > wrote: >>> > >> > >>> > >> > > Hi Alkis, >>> > >> > > I'm generally in favor of this, my main concern/question is >>> trying >>> > to >>> > >> > > encourage work to be in the open. I don't think in the long >>> run it >>> > is >>> > >> > good >>> > >> > > for users to always have proprietary extensions inside of >>> Parquet. >>> > >> > > >>> > >> > > IMO, I think the next steps would be to add implementations to >>> write >>> > >> out >>> > >> > > the footer extension points. >>> > >> > > >>> > >> > > Thanks, >>> > >> > > Micah >>> > >> > > >>> > >> > > On Mon, Jun 24, 2024 at 1:24 PM Alkis Evlogimenos >>> > >> > > <alkis.evlogime...@databricks.com.invalid> wrote: >>> > >> > > >>> > >> > > > The snafus are fixed. The original should work now. >>> > >> > > > >>> > >> > > > On Sun, 23 Jun 2024, 17:58 Alkis Evlogimenos, < >>> > >> > > > alkis.evlogime...@databricks.com> wrote: >>> > >> > > > >>> > >> > > > > Due to some sharing snafus with automation, please request >>> > access >>> > >> to >>> > >> > > > > comment. If you are just reading I've published this here: >>> > >> > > > > >>> > >> > > > >>> > >> > > >>> > >> > >>> > >> >>> > >>> https://docs.google.com/document/d/e/2PACX-1vThXkhHNozn_p1ZZWF-nCzOtoP1lKmkaV4Legq2FaRiIgwyY2XC9AmKpBtpeF8jbBB4wfjmQ6UTg03k/pub >>> > >> > > > > >>> > >> > > > > On Fri, Jun 21, 2024 at 10:29 AM Alkis Evlogimenos < >>> > >> > > > > alkis.evlogime...@databricks.com> wrote: >>> > >> > > > > >>> > >> > > > >> Hey folks. >>> > >> > > > >> >>> > >> > > > >> I want to move the extension PR >>> > >> > > > >> <https://github.com/apache/parquet-format/pull/254> >>> forward. >>> > >> > > > >> Unfortunately the discussion was spread across the PR, >>> other >>> > >> threads >>> > >> > > and >>> > >> > > > >> documents making it slow to progress. To avoid further >>> > >> > fragmentation I >>> > >> > > > have >>> > >> > > > >> put together a document >>> > >> > > > >> < >>> > >> > > > >>> > >> > > >>> > >> > >>> > >> >>> > >>> https://docs.google.com/document/d/1KkoR0DjzYnLQXO-d0oRBv2k157IZU0_injqd4eV4WiI/edit >>> > >> > > > > >>> > >> > > > >> discussing the extensions mechanism in isolation. I >>> believe the >>> > >> > > document >>> > >> > > > >> addresses all the concerns/comments from the PR and mailing >>> > list >>> > >> > > > >> discussions brought forward so far. >>> > >> > > > >> >>> > >> > > > >> I propose we continue the discussion in the document and >>> once >>> > >> > > everything >>> > >> > > > >> is addressed, we finalize the PR. >>> > >> > > > >> >>> > >> > > > >> Thank you, >>> > >> > > > >> >>> > >> > > > > >>> > >> > > > >>> > >> > > >>> > >> > >>> > >> >>> > > >>> > >>> >>