Iceberg sync notes - 22 September 2021

Ryan Blue Mon, 04 Oct 2021 13:41:53 -0700

Hi everyone,

Here are my notes from the sync. They're also published in the agenda/notes
doc
<https://docs.google.com/document/d/1YuGhUdukLP5gGiqCbk0A5_Wifqe2CZWgOd3TbhY3UQg/edit#heading=h.dcom1tfn61k>.
If you have any additions or corrections, feel free to reply! And if you
want to join us at the next sync meeting, you can add yourself to the
invite group <https://groups.google.com/g/iceberg-sync>.



   -

   Highlights
   -

      Gradle upgraded to 7.2 (Thanks, Eduard!)
      -

      The roadmap is on the site (Thanks Jack and Eduard!)
      -

      Flink supports anonymous tables (Thanks, OpenInx!)
      -

      New writer classes are being added (Thanks, Anton!)
      -

      Job planning takes deletes into account (Thanks, WinkerDu!)
      -

   Releases
   -

      Are we planning to make the next release 1.0.0?
      -

         From discussion on 1.0, the consensus was: probably not. Will
         target 0.13.0
         -

      Projects created for roadmap, please link issues/PRs when you see
      unlinked ones
      -

      Planned features for the next release 0.13.0/1.0.0?
      -

         Please start marking issues to be included
         -

   Discussion
   -

      Tabular and the Iceberg community
      <https://tabular.io/blog/tabular-and-the-iceberg-community/>
      -

         Ryan: I just want to highlight that we wrote a blog post on how
         Tabular will interact with the Iceberg community. Please read
it and reach
         out if you have any questions. We want to support the community and
         continue working to keep it healthy.
         -

      Roadmap next steps
      -

         Coordinators - please volunteer to coordinate roadmap projects and
         we’ll work on getting permissions set up.
         -

      Iceberg 1.0 tasks
      -

         Summary: after discussing it, we decided that most people expect
         the v2 features to be fully usable and baked for Iceberg 1.0 and we
         shouldn’t make the 1.0 about API stability only. As a result,
we will work
         get finish delta compaction before 1.0 as well as API stability.
         -

         Jack: delete compaction needed, Spark streaming can come later
         -

         Ryan: Is feature completeness in some areas part?
         -

         Kyle: delete compaction seems relevant
         -

         Dan: there is confusion about 1.0 because v2 features are not
         there. Need to be clear about this
         -

         Jack: need good documentation if we consider format version
         separate from package version. View spec will also be versioned
         -

         Jack: API stability could be 1.0, but we are trying to have
         time-based releases
         -

         We’ll include v2 features in 1.0
         -

            Jack delete compaction
            -

            Steven: some MVP features (like delete compaction)
            -

            Anton: it makes sense to include this for API stability
            -

            Delete compaction is rough consensus
            -

         Kyle: what about Python?
         -

            Ryan: Separate versioning for different languages
            -

            Dan: Barrier to releases if not separate
            -

         Ryan: What does API stability look like?
         -

            Jack: one or two time-based releases to get v2 work done
            -

               Also writer interfaces
               -

            Dan: What are we calling stable? Add semver checking to the
            build
            -

            Which Jars have guarantees?
            -

            Anton: core is more developer-facing than API
            -

               Document which APIs are internal?
               -

            Annotations or module based guarantees?
            -

            Core is a problem
            -

            Anton: could promote to core
            -

            Ryan: Need to audit the APIs to determine
            -

            Ryan: What about just using lower guarantees for core?
            -

               1 point release
               -

         Ryan: issues for binary compatibility checks
         -

         Kyle: we should have tests across versions
         -

            Badrul, Anton: +1 for testing against older versions
            -

            Not part of 1.0
            -

      Spark version support strategy
      -

         Anton: sounds like we need some support for older versions
         -

            Option 2: Separate repo
            -

            Option 3: Different modules
            -

            Option 3 (modified): Fewer modules
            -

         Anton: What is the main difference between option 3 and 3 modified?
         -

         Ryan: The main purpose is to get as close to the benefit of a
         separate repo as possible, but still make it possible to run
CI with the
         whole project. The main drawback to option 2 is that changes
to the main
         Iceberg project can break Spark easily. If Spark is relying
on snapshot
         builds, then that will break all Spark PRs until it is fixed,
which is a
         bad experience for contributors. Keeping Spark in the main
repo but making
         it so that only the latest supported Spark version is built by default
         allows most people to work on the latest Spark, while other
people that
         care about older versions can continue maintaining those.
People working on
         Spark can concentrate on a single version. But the drawback
is that changes
         to core may still need to update all Spark modules. I think this is
         probably the best option because we should avoid breaking
changes in core
         anyway.
         -

         Note: There were other perspectives here as well, but we agreed to
         bring them up on the dev list so follow the Spark version thread.
         -

      Topics that weren’t covered due to time:
      -

         Spark read/write conf
         -

         Flink next steps for FLIP-27


-- 
Ryan Blue
Tabular

Iceberg sync notes - 22 September 2021

Reply via email to