Hi Dan, Thanks for raising this. The document asks 4 questions, I'll put my own thoughts here:
Do we agree the time-based, per-feature gate is challenging in the > long-term as the primary coordination mechanism? I don't agree with this at the moment. I think we can revisit this when we see concrete issues arising, before that it seems like we are making more work based on assumptions of where the ecosystem is headed. I think one still faces mostly the same decisions with a monolithic version scheme (you still need to decide when the new version becomes the default). I agree with Andrew that building out the compatibility matrix helps people make informed decisions here. Is a deliberately curated capability level — "V3" — the right vehicle for > setting shared reader/writer expectations, building on issue #384? I think until we see otherwise the parquet spec version or feature year is sufficient here. In general, what I've observed for monolithic version numbering is that it: 1. It either slows down adoption of things people really want to use and know that the feature won't cause breakages. 2. It doesn't really help with adoption because implementers still pick and choose what they see as valuable or have time to do. We end up with a lot of implementations saying we support version "3" with the exceptions of feature X,Y,Z That being said, I think it would be useful for implementations to consider how to let users choose on the spectrum from "conservative for compatibility" to "bleeding edge" without having to toggle each feature individually. As mentioned above, I think using Parquet format version or feature year to help toggle these is likely a good place to start. I also feel that this is something that implementations can choose on their own. > Which features belong in the first such bundle, given how far adoption of > things like Variant and Geo has already run ahead? Variant and Geo are probably a good discussion point. What do you feel would change with adoption of these types if Parquet where to start having numbered versions again? How has it hindered their adoption not having a V3 to include them in? How would having a V3 increase their adoption? > How far can and should we go in aligning the magic number, footer version, > and release version? In practice this feels like generally more churn then it is worth. There are many parquet readers out there that don't align with parquet-java versioning, so users are still going to be reading the release notes or the compatibility matrix to figure out what they need/want. It would be good to make sure implementation actually follow the spec on "footer version", so it can be used as a knob at some point. Cheers, Micah On Tue, Jun 2, 2026 at 5:55 AM Andrew Lamb <[email protected]> wrote: > Thank you Dan, this is a very clear document > > I think this is the most important part and worth posting to the mailing > list > > > The two-year norm also doesn't match what's actually happening in the > ecosystem. Some features are entering mainstream usage well ahead of any > such window — Variant and the Geo types are being adopted aggressively by > writers and engines because the demand is real and immediate. > > In my opinion, the Apache Parquet mailing list contributors, committers, > and PMC does not (and should not) have the luxury of mandating adoption > trends (either slower or faster) across the ecosystem. > > As this point in your document makes clear, there are many Parquet > stakeholders, each with different constraints and needs, that will adopt > the features at their own rate. We shouldn't be trying to hold them back > from using new features. > > I believe the best thing we can do as a community is foster clear > communication that helps implementers make the best decisions for their own > adoption. Specifically this is embodied in the "implementation status" > page[1] which we can and should continue to evolve to let Parquet users > choose the feature set that is right for them > > Andrew > > [1]: https://parquet.apache.org/docs/file-format/implementationstatus/ > > On Mon, Jun 1, 2026 at 5:36 PM Daniel Weeks <[email protected]> wrote: > > > Hey Parquet Community, > > > > A few weeks back during one of the community syncs, the topic of > versioning > > came up (again) and I offered to pull together some thoughts on how we > > might want to move forward. > > > > I've gathered some of the background and concerns about how we address > > versioning across the ecosystem in order to have a discussion and gather > > feedback. > > > > There are a lot of new features and major capabilities that community > > members are eager to introduce, so it would be great to have a clear path > > forward on how to coordinate changes. > > > > I've included the discussion in a doc > > < > > > https://docs.google.com/document/d/1zrbGT4kRCEdadBUludwfQR9b2CfLgH-RWn9zE84gYfg/edit?tab=t.0#heading=h.aozivdm2oj4d > > > > > so that people can comment and respond either in-line or on this thread. > > > > Looking forward to discussion and feedback! > > -Dan > > >
