Meeting Minutes from 02/09 Iceberg Sync

Sam Redai Thu, 10 Feb 2022 15:03:27 -0800

Hey Iceberg Community,

Here are the minutes and recording from our Iceberg Sync that took
place on *February
9th, 9am-10am PT*.


Always remember, anyone can join the discussion so feel free to share the
Iceberg-Sync <https://groups.google.com/g/iceberg-sync> google group with
anyone who is seeking an invite. The notes and the agenda are posted
in the live
doc
<https://docs.google.com/document/d/1YuGhUdukLP5gGiqCbk0A5_Wifqe2CZWgOd3TbhY3UQg/edit?usp=drive_web>
that's
also attached to the meeting invitation and it's a good place to add items
as you see fit so we can discuss them in the next community sync.

Meeting Recording ⭕
<https://drive.google.com/file/d/1m5J6oHZs-fGQulMaWJ7q6joJcJ06FeeW/view>

Top of the Meeting Highlights

   -

   New Iceberg site released with versioned docs (Thanks Sam!)
   -

   0.13.0 Released (Thanks Jack!)
   -

   Spark 3.2 with Scala 2.13 support was added (Thanks Farooq!)

0.13.1 Release

   -

   This will be prioritized for a release in the very near future. A
   regression was detected where predicates from ON clauses are not pushed
   down.

0.14.0 Release

   -

   V2 Row-level deletes
   -

   Z-Ordering
   -

   CRT (common runtime) support should increase s3 upload/download
   throughput to 90 GB/s
   -

   Glue optimistic locking (no longer need DyanmoDB tables to handle
   locking)
   -

   REST catalog implementation
   -

   View support
   -

      Spec PR is pending a merge
      -

      Once merged, expect a vote on the mailing list
      -

   Target release date: early-to-mid March

Docs Contributions

   -

   Docs contributions should still primarily be made against the
   `apache/iceberg` repo
   -

   A `docs/common` and `docs/versioned` directory has been added which
   contain only markdown files. Please open PR’s against either of those
   directories
   -

   During a version release, the release manager will move the files over
   from `apache/iceberg` to `apache/iceberg-docs`. This is documented in
   the iceberg-docs README (thanks Jack!)
   -

   Hotfixes for docs that have already been released can be made directly
   against that version's branch in `apache/iceberg`.

FileIO Metrics

   -

   There are things that rely on the Hadoop filesystem metrics in Flink and
   Spark that are not provided by other FileIO implementations i.e. S3FileIO.
   Draft PR #4050 <https://github.com/apache/iceberg/pull/4050> addresses
   this issue.
   -

   There’s an open question about other metrics besides those originally
   included in the hadoop filesystem. How far can we push this functionality
   to produce features around data observability?
   -

      A pluggable design here may be better to avoid scope creep.
      -

      This should use standard interfaces that anyone can plug into with
      whichever metrics tool they’d like to use (similar to notifications).
      -

   Some considerations required, such as differentiating what was read from
   S3 and what was actually used by Spark when generating certain metrics.

Change Data Capture (PR #3941
<https://github.com/apache/iceberg/issues/3941>)

   -

   Solutions that don’t require any specification changes are currently
   being explored since this would be very intrusive and would probably
   require an entire new spec version. Backwards compatibility would also be
   difficult to achieve.
   -

   Feeds generated between two snapshots can relatively easily determine
   INSERTS and DELETES, however UPDATES are challenging.
   -

      One proposal is to infer UPDATES from collectively analyzing
      INSERTS+DELETES, using a primary key that’s provided by the user
      -

   A design doc is being finalized around this and will be shared soon.


Thanks everyone for participating!

Meeting Minutes from 02/09 Iceberg Sync

Reply via email to