Meeting Minutes from 2023-01-11 Iceberg Sync

Eduard Tudenhoefner Thu, 12 Jan 2023 11:29:20 -0800

Hi Iceberg Community,

Here are the minutes and recording from our Iceberg Sync.


Always remember, anyone can join the discussion so feel free to share the
Iceberg-Sync <https://groups.google.com/g/iceberg-sync> google group with
anyone seeking an invite.
The notes and the agenda are posted in the Iceberg Sync doc
<https://docs.google.com/document/d/1YuGhUdukLP5gGiqCbk0A5_Wifqe2CZWgOd3TbhY3UQg/edit?usp=drive_web>
that's
also attached to the meeting invitation and it's an excellent place to add
items as you see fit so we can discuss them in the following community sync.

Meeting Recording
<https://drive.google.com/file/d/1-bGfpK9Kv-_g0V8Y2JzNlFtPWa3590tV/view?usp=sharing>
⭕

Meeting Transcript
<https://docs.google.com/document/d/14aiT56MhwLfrn_cAjggO6-p2-OGcL8fhCNj9H5j4aoc/edit?usp=sharing>

   -

   Highlights
   -

      PyIceberg 0.2.1 is released (Thanks, Fokko!)
      -

      Python projection by field IDs is in (Thanks, Fokko!)
      -

      Storage-partition joins in Spark are supported (Thanks, Yufei and
      Anton!)
      -

      Arrow environment settings are defaulted (Thanks, Anton!)
      -

      Spark changelog readers were added (Thanks, Yufei!)
      -

   Releases
   -

      Python 0.3.0
      -

      1.2.0
      -

         Default distribution mode for Spark MERGE
         -

         SHOW TABLES EXTENDED
         -

         Szehon’s delete metadata table for delete file compaction
         <https://github.com/apache/iceberg/pull/6365>
         -

         Branch commits for operations other than append and delete
         -

         Vectorized Arrow read path fix for dictionary-encoded values
         <https://github.com/apache/iceberg/pull/3024>
         -

         Parquet: Fixes Incorrect Skipping of RowGroups with NaNs
         <https://github.com/apache/iceberg/pull/6517>
         -

   Discussion
   -

      Delta Iceberg Conversion: Snapshot a delta lake table to an iceberg
      table <https://github.com/apache/iceberg/pull/6449>
      -

         General Expression PR in Spark
         <https://github.com/apache/spark/pull/38823#discussion_r1066610931>
         -

      Partition stats tracking
      
<https://docs.google.com/document/d/1vaufuD47kMijz97LxM67X8OX-W2Wq7nmlz3jRo8J5Qk/edit>
      -

      Defining prefix ownership in the format (v3?)
      <https://github.com/apache/iceberg/issues/4159>
      -

      Materialized view proposal
      <https://github.com/apache/iceberg/issues/6420>
      -

      https://www.cidrdb.org/cidr2023/papers/p92-jain.pdf // Datalake Paper
      by Databricks folks
      -

         MERGE: default distribution mode is None
         -

         Need to set a default distribution mode
         -

         Async file downloads


Thanks everyone!

Meeting Minutes from 2023-01-11 Iceberg Sync

Reply via email to