Hey Iceberg Community, Here are the minutes and recording from our Iceberg Sync that took place on *February 9th, 9am-10am PT*.
Always remember, anyone can join the discussion so feel free to share the Iceberg-Sync <https://groups.google.com/g/iceberg-sync> google group with anyone who is seeking an invite. The notes and the agenda are posted in the live doc <https://docs.google.com/document/d/1YuGhUdukLP5gGiqCbk0A5_Wifqe2CZWgOd3TbhY3UQg/edit?usp=drive_web> that's also attached to the meeting invitation and it's a good place to add items as you see fit so we can discuss them in the next community sync. Meeting Recording ⭕ <https://drive.google.com/file/d/1m5J6oHZs-fGQulMaWJ7q6joJcJ06FeeW/view> Top of the Meeting Highlights - New Iceberg site released with versioned docs (Thanks Sam!) - 0.13.0 Released (Thanks Jack!) - Spark 3.2 with Scala 2.13 support was added (Thanks Farooq!) 0.13.1 Release - This will be prioritized for a release in the very near future. A regression was detected where predicates from ON clauses are not pushed down. 0.14.0 Release - V2 Row-level deletes - Z-Ordering - CRT (common runtime) support should increase s3 upload/download throughput to 90 GB/s - Glue optimistic locking (no longer need DyanmoDB tables to handle locking) - REST catalog implementation - View support - Spec PR is pending a merge - Once merged, expect a vote on the mailing list - Target release date: early-to-mid March Docs Contributions - Docs contributions should still primarily be made against the `apache/iceberg` repo - A `docs/common` and `docs/versioned` directory has been added which contain only markdown files. Please open PR’s against either of those directories - During a version release, the release manager will move the files over from `apache/iceberg` to `apache/iceberg-docs`. This is documented in the iceberg-docs README (thanks Jack!) - Hotfixes for docs that have already been released can be made directly against that version's branch in `apache/iceberg`. FileIO Metrics - There are things that rely on the Hadoop filesystem metrics in Flink and Spark that are not provided by other FileIO implementations i.e. S3FileIO. Draft PR #4050 <https://github.com/apache/iceberg/pull/4050> addresses this issue. - There’s an open question about other metrics besides those originally included in the hadoop filesystem. How far can we push this functionality to produce features around data observability? - A pluggable design here may be better to avoid scope creep. - This should use standard interfaces that anyone can plug into with whichever metrics tool they’d like to use (similar to notifications). - Some considerations required, such as differentiating what was read from S3 and what was actually used by Spark when generating certain metrics. Change Data Capture (PR #3941 <https://github.com/apache/iceberg/issues/3941>) - Solutions that don’t require any specification changes are currently being explored since this would be very intrusive and would probably require an entire new spec version. Backwards compatibility would also be difficult to achieve. - Feeds generated between two snapshots can relatively easily determine INSERTS and DELETES, however UPDATES are challenging. - One proposal is to infer UPDATES from collectively analyzing INSERTS+DELETES, using a primary key that’s provided by the user - A design doc is being finalized around this and will be shared soon. Thanks everyone for participating!