Hi everyone, Here are my notes from the sync. They're also published in the agenda/notes doc <https://docs.google.com/document/d/1YuGhUdukLP5gGiqCbk0A5_Wifqe2CZWgOd3TbhY3UQg/edit#heading=h.dcom1tfn61k>. If you have any additions or corrections, feel free to reply! And if you want to join us at the next sync meeting, you can add yourself to the invite group <https://groups.google.com/g/iceberg-sync>.
- Highlights - Gradle upgraded to 7.2 (Thanks, Eduard!) - The roadmap is on the site (Thanks Jack and Eduard!) - Flink supports anonymous tables (Thanks, OpenInx!) - New writer classes are being added (Thanks, Anton!) - Job planning takes deletes into account (Thanks, WinkerDu!) - Releases - Are we planning to make the next release 1.0.0? - From discussion on 1.0, the consensus was: probably not. Will target 0.13.0 - Projects created for roadmap, please link issues/PRs when you see unlinked ones - Planned features for the next release 0.13.0/1.0.0? - Please start marking issues to be included - Discussion - Tabular and the Iceberg community <https://tabular.io/blog/tabular-and-the-iceberg-community/> - Ryan: I just want to highlight that we wrote a blog post on how Tabular will interact with the Iceberg community. Please read it and reach out if you have any questions. We want to support the community and continue working to keep it healthy. - Roadmap next steps - Coordinators - please volunteer to coordinate roadmap projects and we’ll work on getting permissions set up. - Iceberg 1.0 tasks - Summary: after discussing it, we decided that most people expect the v2 features to be fully usable and baked for Iceberg 1.0 and we shouldn’t make the 1.0 about API stability only. As a result, we will work get finish delta compaction before 1.0 as well as API stability. - Jack: delete compaction needed, Spark streaming can come later - Ryan: Is feature completeness in some areas part? - Kyle: delete compaction seems relevant - Dan: there is confusion about 1.0 because v2 features are not there. Need to be clear about this - Jack: need good documentation if we consider format version separate from package version. View spec will also be versioned - Jack: API stability could be 1.0, but we are trying to have time-based releases - We’ll include v2 features in 1.0 - Jack delete compaction - Steven: some MVP features (like delete compaction) - Anton: it makes sense to include this for API stability - Delete compaction is rough consensus - Kyle: what about Python? - Ryan: Separate versioning for different languages - Dan: Barrier to releases if not separate - Ryan: What does API stability look like? - Jack: one or two time-based releases to get v2 work done - Also writer interfaces - Dan: What are we calling stable? Add semver checking to the build - Which Jars have guarantees? - Anton: core is more developer-facing than API - Document which APIs are internal? - Annotations or module based guarantees? - Core is a problem - Anton: could promote to core - Ryan: Need to audit the APIs to determine - Ryan: What about just using lower guarantees for core? - 1 point release - Ryan: issues for binary compatibility checks - Kyle: we should have tests across versions - Badrul, Anton: +1 for testing against older versions - Not part of 1.0 - Spark version support strategy - Anton: sounds like we need some support for older versions - Option 2: Separate repo - Option 3: Different modules - Option 3 (modified): Fewer modules - Anton: What is the main difference between option 3 and 3 modified? - Ryan: The main purpose is to get as close to the benefit of a separate repo as possible, but still make it possible to run CI with the whole project. The main drawback to option 2 is that changes to the main Iceberg project can break Spark easily. If Spark is relying on snapshot builds, then that will break all Spark PRs until it is fixed, which is a bad experience for contributors. Keeping Spark in the main repo but making it so that only the latest supported Spark version is built by default allows most people to work on the latest Spark, while other people that care about older versions can continue maintaining those. People working on Spark can concentrate on a single version. But the drawback is that changes to core may still need to update all Spark modules. I think this is probably the best option because we should avoid breaking changes in core anyway. - Note: There were other perspectives here as well, but we agreed to bring them up on the dev list so follow the Spark version thread. - Topics that weren’t covered due to time: - Spark read/write conf - Flink next steps for FLIP-27 -- Ryan Blue Tabular