Hi Iceberg Community, Here are the minutes and recording from our Iceberg Sync.
Always remember, anyone can join the discussion so feel free to share the Iceberg-Sync <https://groups.google.com/g/iceberg-sync> google group with anyone seeking an invite. The notes and the agenda are posted in the Iceberg Sync doc <https://docs.google.com/document/d/1YuGhUdukLP5gGiqCbk0A5_Wifqe2CZWgOd3TbhY3UQg/edit?usp=drive_web> that's also attached to the meeting invitation and it's an excellent place to add items as you see fit so we can discuss them in the following community sync. Meeting Recording <https://drive.google.com/file/d/1-bGfpK9Kv-_g0V8Y2JzNlFtPWa3590tV/view?usp=sharing> ⭕ Meeting Transcript <https://docs.google.com/document/d/14aiT56MhwLfrn_cAjggO6-p2-OGcL8fhCNj9H5j4aoc/edit?usp=sharing> - Highlights - PyIceberg 0.2.1 is released (Thanks, Fokko!) - Python projection by field IDs is in (Thanks, Fokko!) - Storage-partition joins in Spark are supported (Thanks, Yufei and Anton!) - Arrow environment settings are defaulted (Thanks, Anton!) - Spark changelog readers were added (Thanks, Yufei!) - Releases - Python 0.3.0 - 1.2.0 - Default distribution mode for Spark MERGE - SHOW TABLES EXTENDED - Szehon’s delete metadata table for delete file compaction <https://github.com/apache/iceberg/pull/6365> - Branch commits for operations other than append and delete - Vectorized Arrow read path fix for dictionary-encoded values <https://github.com/apache/iceberg/pull/3024> - Parquet: Fixes Incorrect Skipping of RowGroups with NaNs <https://github.com/apache/iceberg/pull/6517> - Discussion - Delta Iceberg Conversion: Snapshot a delta lake table to an iceberg table <https://github.com/apache/iceberg/pull/6449> - General Expression PR in Spark <https://github.com/apache/spark/pull/38823#discussion_r1066610931> - Partition stats tracking <https://docs.google.com/document/d/1vaufuD47kMijz97LxM67X8OX-W2Wq7nmlz3jRo8J5Qk/edit> - Defining prefix ownership in the format (v3?) <https://github.com/apache/iceberg/issues/4159> - Materialized view proposal <https://github.com/apache/iceberg/issues/6420> - https://www.cidrdb.org/cidr2023/papers/p92-jain.pdf // Datalake Paper by Databricks folks - MERGE: default distribution mode is None - Need to set a default distribution mode - Async file downloads Thanks everyone!