Meeting Minutes 2024-04-17

Brian Olsen Tue, 23 Apr 2024 12:50:40 -0700

Hey Iceberg Nation,

Here are the meeting minutes from last week's meeting.


Summary: We discussed fixes and improvements made to FileIO, REST catalog,
PyIceberg, and Spark integration between columns and partitions, and agreed
to handle them separately. We considered improvements for path encoding
issues with special characters, for V3 spec. Anton proposed updating the
Spark integration with Comet project for native vectorized readers being
worked on. Encryption support making good progress, just need to finalize
key management details. We're now preparing for the Iceberg Summit, agenda
and talks to be announced soon!

Transcription/Recording: https://youtu.be/Bk8mXQ6UAPs

Meeting Notes:

* Highlights
    * REST catalog’s HTTP client supports proxy and timeout config (Thanks,
Harish!)
    * Fixed new FileIO method defaults (Thanks, Amogh!)
    * PyIceberg: Support for writing to partitioned tables that use
identity partitioning (Thanks Adrian!)
    * Rust: Implement projection to perform partition based pruning (Thanks
Scott!)
    * PyIceberg: Adding metadata tables (Thanks (in advance) Gowthami,
Andre, Drew, Kevin, Sung)
    * PyIceberg [Pyodide](https://pyodide.org/en/stable/) integration: This
enables us to run pyIceberg in the browser via WASM without requiring an
install for folks to learn about Iceberg and table formats
https://github.com/pyodide/pyodide/issues/4644
https://github.com/pyodide/pyodide/pull/4648, we’re still waiting on the
pyArrow-Pyodide integration.
* Releases
    * Java 1.5.1 Release
        * JDBC Catalog: Fix Escape character in GetNamespace SQL
https://github.com/apache/iceberg/pull/9407
        * JDBC Catalog: Fix JDBC Catalog table commit when migrating from
schema V0 to V1 https://github.com/apache/iceberg/pull/10111
    * PyIceberg 0.6.1 Release –
https://lists.apache.org/thread/pry0n9zm2h27wbbbyslm86hh1o23q2tf
          *Milestone Link:
https://github.com/apache/iceberg-python/pulls?q=is%3Apr+milestone%3A%22PyIceberg+0.6.1%22+is%3Aclosed
* Discussion
    * Field and partition ID overlap in metadata tables and columns
           * Originally partition IDs started at 1000 to avoid overlap with
column IDs from 1, but collisions happening now
           * Rather than change the IDs which would break compatibility,
best to handle columns and partitions separately
           * Zehan proposed utility methods to reassign partition IDs as
needed
    * Migrating the community to the REST catalog protocol
            * Goal is not to make REST API the only option, but best for
cross-language usage long-term
            * Some changes needed to improve vendor-agnostic integrations
and TCK validation
            * Still room for other catalog options like JDBC and Hive
metastore wrappers
    * Quotes in S3 locations (https://github.com/apache/iceberg/issues/10168
).
            * Issue with quotes and other special characters in partition
field values causing invalid S3 paths
            * S3 allows more special characters than URI specification, so
parsing issues arise
            * May restrict special chars in Iceberg itself for portability
across storage systems
            * Consider standardized path encoding for V3 spec
    * [Comet](https://github.com/apache/arrow-datafusion-comet) in Iceberg.
          * Comet project from Arrow will provide native vectorized readers
for Iceberg
          * Alternative to built-in reader with more features like
vectorized reads
          * Designed for Iceberg, handles projections and metadata
          * Could enable fully native Spark execution
    * Special character in column names
https://github.com/apache/iceberg/issues/10120
    * Spec V3 and 1.6/2.0
            * File and manifest encryption implemented
            * Just need to finalize key management integration
            * May use key metadata in snapshots and REST API, or custom key
providers
    * Iceberg Summit
            * Finalizing talk selection now, will announce agenda next week
            * Very high quality submissions received

Meeting Minutes 2024-04-17

Reply via email to