Hey Iceberg Nation, Here are the meeting minutes from last week's meeting.
Summary: We discussed fixes and improvements made to FileIO, REST catalog, PyIceberg, and Spark integration between columns and partitions, and agreed to handle them separately. We considered improvements for path encoding issues with special characters, for V3 spec. Anton proposed updating the Spark integration with Comet project for native vectorized readers being worked on. Encryption support making good progress, just need to finalize key management details. We're now preparing for the Iceberg Summit, agenda and talks to be announced soon! Transcription/Recording: https://youtu.be/Bk8mXQ6UAPs Meeting Notes: * Highlights * REST catalog’s HTTP client supports proxy and timeout config (Thanks, Harish!) * Fixed new FileIO method defaults (Thanks, Amogh!) * PyIceberg: Support for writing to partitioned tables that use identity partitioning (Thanks Adrian!) * Rust: Implement projection to perform partition based pruning (Thanks Scott!) * PyIceberg: Adding metadata tables (Thanks (in advance) Gowthami, Andre, Drew, Kevin, Sung) * PyIceberg [Pyodide](https://pyodide.org/en/stable/) integration: This enables us to run pyIceberg in the browser via WASM without requiring an install for folks to learn about Iceberg and table formats https://github.com/pyodide/pyodide/issues/4644 https://github.com/pyodide/pyodide/pull/4648, we’re still waiting on the pyArrow-Pyodide integration. * Releases * Java 1.5.1 Release * JDBC Catalog: Fix Escape character in GetNamespace SQL https://github.com/apache/iceberg/pull/9407 * JDBC Catalog: Fix JDBC Catalog table commit when migrating from schema V0 to V1 https://github.com/apache/iceberg/pull/10111 * PyIceberg 0.6.1 Release – https://lists.apache.org/thread/pry0n9zm2h27wbbbyslm86hh1o23q2tf *Milestone Link: https://github.com/apache/iceberg-python/pulls?q=is%3Apr+milestone%3A%22PyIceberg+0.6.1%22+is%3Aclosed * Discussion * Field and partition ID overlap in metadata tables and columns * Originally partition IDs started at 1000 to avoid overlap with column IDs from 1, but collisions happening now * Rather than change the IDs which would break compatibility, best to handle columns and partitions separately * Zehan proposed utility methods to reassign partition IDs as needed * Migrating the community to the REST catalog protocol * Goal is not to make REST API the only option, but best for cross-language usage long-term * Some changes needed to improve vendor-agnostic integrations and TCK validation * Still room for other catalog options like JDBC and Hive metastore wrappers * Quotes in S3 locations (https://github.com/apache/iceberg/issues/10168 ). * Issue with quotes and other special characters in partition field values causing invalid S3 paths * S3 allows more special characters than URI specification, so parsing issues arise * May restrict special chars in Iceberg itself for portability across storage systems * Consider standardized path encoding for V3 spec * [Comet](https://github.com/apache/arrow-datafusion-comet) in Iceberg. * Comet project from Arrow will provide native vectorized readers for Iceberg * Alternative to built-in reader with more features like vectorized reads * Designed for Iceberg, handles projections and metadata * Could enable fully native Spark execution * Special character in column names https://github.com/apache/iceberg/issues/10120 * Spec V3 and 1.6/2.0 * File and manifest encryption implemented * Just need to finalize key management integration * May use key metadata in snapshots and REST API, or custom key providers * Iceberg Summit * Finalizing talk selection now, will announce agenda next week * Very high quality submissions received