talatuyarer commented on code in PR #16516: URL: https://github.com/apache/iceberg/pull/16516#discussion_r3284964920
########## site/docs/blog/posts/2026-05-19-iceberg-1.11.0-release.md: ########## @@ -0,0 +1,194 @@ +--- +date: 2026-05-19 +title: Apache Iceberg 1.11.0 Release +slug: apache-iceberg-1.11.0-release +authors: + - iceberg-pmc +categories: + - release +--- + +<!-- + - Licensed to the Apache Software Foundation (ASF) under one or more + - contributor license agreements. See the NOTICE file distributed with + - this work for additional information regarding copyright ownership. + - The ASF licenses this file to You under the Apache License, Version 2.0 + - (the "License"); you may not use this file except in compliance with + - the License. You may obtain a copy of the License at + - + - http://www.apache.org/licenses/LICENSE-2.0 + - + - Unless required by applicable law or agreed to in writing, software + - distributed under the License is distributed on an "AS IS" BASIS, + - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + - See the License for the specific language governing permissions and + - limitations under the License. + --> + +The Apache Iceberg community is pleased to announce the release of Apache Iceberg 1.11.0. This release is the result of over **1,000 commits** from **200+ contributors**. See the [release notes](https://iceberg.apache.org/releases/#1110-release) for the complete list of changes. + +<!-- more --> + +## Release Highlights + +### REST Catalog: A More Complete Protocol + +1.11.0 represents the most significant step forward for the REST catalog protocol since it was introduced. + +[Remote scan planning](https://github.com/apache/iceberg/pull/13400) allows catalog servers to plan table scans and stream back file scan tasks directly. Previously, every client had to fetch manifest lists and manifests itself to determine which files to read. With server-side planning, clients receive only the relevant scan tasks, reducing driver memory pressure and enabling server-side optimizations that are transparent to the query engine. This release extends remote scan planning to cover [incremental scans](https://github.com/apache/iceberg/pull/14661) for Structured Streaming workloads and [metadata tables](https://github.com/apache/iceberg/pull/14881) such as `history` and `snapshots`. A [per-table override](https://github.com/apache/iceberg/pull/15572) allows individual tables to opt out of catalog-level scan planning mode when needed. + +[Freshness-aware table loading](https://github.com/apache/iceberg/pull/14398) adds ETag-based caching to table metadata. When a client loads a table it already has metadata for, the server can return a `304 Not Modified` response instead of re-sending the full metadata payload, cutting unnecessary round-trips in tight read loops and interactive workloads. + +[Idempotency key support](https://github.com/apache/iceberg/pull/14740) introduces a standard `Idempotency-Key` header for mutating catalog operations. Retried writes — commits, creates, and drops — are now guaranteed not to execute twice, preventing duplicate snapshots and corrupted state from network timeouts. + +[Register View](https://github.com/apache/iceberg/pull/14868) completes the view lifecycle in the REST catalog. Just as tables can be registered from their metadata location, views can now be [registered via the REST API](https://github.com/apache/iceberg/pull/14870) too — enabling cross-catalog migrations and re-attaching orphaned view metadata. + +[Custom Table and View Operations](https://github.com/apache/iceberg/pull/14465) can now be injected into the REST catalog, allowing users to extend or override default `TableOperations` and `ViewOperations` behavior without forking the catalog implementation. + +### OpenAPI Specification Updates + +Several protocol-level additions land in the OpenAPI spec this release, tightening the contract between clients and catalog servers. + +- **Namespace separator configurable by server**: [The server can now advertise a custom namespace separator](https://github.com/apache/iceberg/pull/14448) in the config endpoint, allowing catalogs that use separators other than `.` to communicate this to clients without out-of-band configuration. +- **ETag for `CommitTableResponse`**: [ETag support on commit responses](https://github.com/apache/iceberg/pull/14760) enables clients to detect whether a concurrent write changed the table between their load and commit, complementing the existing ETag on `LoadTableResult`. +- **S3 signing endpoint promoted to main spec**: [The S3 signing endpoint](https://github.com/apache/iceberg/pull/15450) moves from an extension into the main OpenAPI spec, making it an official part of the REST catalog protocol. +- **Partition statistics in `TableUpdate`**: [`SetPartitionStatisticsUpdate` and `RemovePartitionStatisticsUpdate`](https://github.com/apache/iceberg/pull/14957) are now included in the `TableUpdate` union type, allowing partition stats to be managed through the standard commit path. +- **Storage credentials in scan planning responses**: [Storage credentials are now returned](https://github.com/apache/iceberg/pull/15524) in `PlanTableScanResponse` and `FetchPlanningResultResponse` when the `include-credentials` flag is set, so clients performing remote scan planning can access data without a separate credential fetch. + +### Spec: SQL UDFs and Geospatial Types + +The [SQL UDF Specification](https://github.com/apache/iceberg/pull/14117) introduces a new spec for storing SQL user-defined functions in Iceberg catalogs. UDFs are versioned, support multiple SQL dialects, and are portable across engines, bringing function management into the catalog layer for the first time. + +[Geospatial bounding box types](https://github.com/apache/iceberg/pull/12667) add native bounding box types and an `INTERSECTS` predicate to Iceberg's type system, enabling spatial partition pruning and file skipping for geospatial workloads directly on Iceberg tables. [Restrictions for geometry types in V3](https://github.com/apache/iceberg/pull/14250) are also clarified in this release. + +Several smaller but meaningful spec additions round out the release: + +- **`added-rows` in snapshot fields**: The [`added-rows` field is restored to snapshot metadata](https://github.com/apache/iceberg/pull/14048), giving engines and monitoring tools a reliable row count per snapshot without scanning data files. +- **`referenced-by` in `loadTable` response**: [`loadTable` now returns a `referenced-by` field](https://github.com/apache/iceberg/pull/13810) listing views and other objects that depend on the table, making dependency tracking possible at the protocol level. +- **`scan-planning-mode` in `LoadTableResult`**: [The server can now advertise its preferred scan planning mode](https://github.com/apache/iceberg/pull/14867) in the `LoadTableResult` config, letting clients know upfront whether to use remote or local scan planning without probing. +- **404 for missing warehouse on config endpoint**: [The `/v1/config` endpoint now returns 404](https://github.com/apache/iceberg/pull/15746) when the requested warehouse does not exist, replacing the previous ambiguous error. + +### Performance and Reliability + +[LIMIT pushdown to scan](https://github.com/apache/iceberg/pull/14615) stops scanning after enough rows are found when a query includes a `LIMIT` clause, rather than reading all matching files. For exploratory queries, this can reduce I/O by orders of magnitude. + +Vectorized reads now cover additional Parquet encodings, eliminating the row-at-a-time fallback for [BYTE_STREAM_SPLIT](https://github.com/apache/iceberg/pull/15373), [DELTA_LENGTH_BYTE_ARRAY, and DELTA_BYTE_ARRAY](https://github.com/apache/iceberg/pull/15362). This is particularly impactful for scientific and ML datasets using float or double columns with `BYTE_STREAM_SPLIT` encoding. + +[Snapshot expiration cleanup modes](https://github.com/apache/iceberg/pull/14287) introduce a new `cleanupMode` API that gives finer control over what gets cleaned up when snapshots expire. + +[Unique table locations](https://github.com/apache/iceberg/pull/12892) via a new catalog property append a UUID to table storage paths, preventing a data loss scenario where `DeleteOrphanFiles` could remove files from a renamed table. This also enables per-table storage lifecycle policies and cost attribution. + +Scheduled credential refresh for [AWS S3FileIO](https://github.com/apache/iceberg/pull/15678) and [GCS FileIO](https://github.com/apache/iceberg/pull/15696) proactively rotates credentials before they expire, eliminating transient failures in long-running Spark and Flink jobs that outlive their initial credential lease. + +The [GCSAnalyticsCore library](https://github.com/apache/iceberg/pull/14333) is now integrated into GCSFileIO, bringing analytics-optimized I/O for Google Cloud Storage. The library improves read throughput for large-scale analytical workloads on GCS, complementing the existing AWS Analytics Accelerator integration on S3. + +### Format V4 Foundations + +1.11.0 begins laying the groundwork for Table Format V4. + +New foundational types — [TrackedFile, TrackingInfo, ContentInfo, and ManifestStats](https://github.com/apache/iceberg/pull/15049) — are the building blocks for V4's adaptive metadata tree. These interfaces define how Iceberg will track files at scale, with [implementations](https://github.com/apache/iceberg/pull/15854), [builders](https://github.com/apache/iceberg/pull/16092), and [partition support](https://github.com/apache/iceberg/pull/16253) being added iteratively across the release cycle. + +The new [FormatModel abstraction](https://github.com/apache/iceberg/pull/12774) replaces hardcoded file format handling with a pluggable interface. [Parquet](https://github.com/apache/iceberg/pull/15253), [ORC](https://github.com/apache/iceberg/pull/15255), Avro, and Arrow each now implement a `FormatModel` contract, making it simpler to add new formats or customize read/write behavior. + +### Encryption + +[Manifest list encryption](https://github.com/apache/iceberg/pull/7770) extends Iceberg's encryption coverage to manifest list files, ensuring that snapshot-level metadata is protected alongside data files. [Automatic key rotation](https://github.com/apache/iceberg/pull/14396) adds support for rotating key encryption keys without manual intervention. + +[Cloud KMS support](https://github.com/apache/iceberg/pull/15272) now covers AWS, Azure, and GCP through a unified `encryption.kms-type` property. The Hive catalog receives [metadata integrity checks for encrypted tables](https://github.com/apache/iceberg/pull/14685) in this release as well. + +### New APIs + +The [Content Stats API](https://github.com/apache/iceberg/pull/13933) introduces classes for file-level statistics — min/max values, null counts, and more — providing richer information for query planners to make better file-skipping and join-ordering decisions. + +The [Partition Stats Scan API](https://github.com/apache/iceberg/pull/14640) replaces the old single-shot partition stats reader with a composable, filterable interface consistent with the rest of Iceberg's Scan API. + +[Overwrite-aware table registration](https://github.com/apache/iceberg/pull/15525) and [FileIO access in the Scan API](https://github.com/apache/iceberg/pull/15561) round out several API additions in this release. + +### Engine Updates + +#### Spark + +[Spark 4.1](https://github.com/apache/iceberg/pull/14946) is now supported, with several notable additions: + +- **MERGE INTO schema evolution**: [Initial support](https://github.com/apache/iceberg/pull/14970) for schema changes during merge writes, [controllable via table property](https://github.com/apache/iceberg/pull/15825) +- **Shredded Variant writes**: [Support for writing shredded Variant data](https://github.com/apache/iceberg/pull/14297) in Iceberg-Spark, with reading backed by [ArrayData.getVariant](https://github.com/apache/iceberg/pull/14349) for row-based Parquet readers +- **Async micro-batch planner**: [Reduces scheduling overhead](https://github.com/apache/iceberg/pull/15299) in Structured Streaming by planning the next micro-batch asynchronously +- **Adaptive split sizing**: [Session config support](https://github.com/apache/iceberg/pull/16088) for tuning split sizes and parallelism + +**Spark 3.4 support is deprecated** in this release. + +#### Flink + +[Flink 2.1](https://github.com/apache/iceberg/pull/14156) is now supported and Flink 1.19 has been removed. + +**Row lineage** readers for [`_row_id` and `_last_updated_sequence_number`](https://github.com/apache/iceberg/pull/14148) are now available in Flink, with [row lineage preserved through RewriteDataFiles](https://github.com/apache/iceberg/pull/14149) so that row identity survives compaction. + +The Flink Dynamic Sink receives major improvements: + +- **Deletion vector support**: [Efficient row-level deletes](https://github.com/apache/iceberg/pull/14414) without rewriting data files +- **Variant type support**: [Variant is now supported in Flink 2.1](https://github.com/apache/iceberg/pull/15265), enabling semi-structured data ingestion and processing in Flink pipelines +- **Schema evolution**: [Support for dropping columns](https://github.com/apache/iceberg/pull/14728) and [case-insensitive field matching](https://github.com/apache/iceberg/pull/14729) make it easier to evolve schemas without pipeline changes +- **SQL-configurable options**: Dynamic Sink options are now configurable through [Flink SQL](https://github.com/apache/iceberg/pull/15279), with [further SQL options added](https://github.com/apache/iceberg/pull/15780) in this release +- **Post-commit maintenance**: [Arbitrary maintenance tasks can now be registered via the IcebergSink Builder](https://github.com/apache/iceberg/pull/15566), enabling custom post-commit operations without a separate pipeline + +[TableMaintenance now supports a coordinator lock](https://github.com/apache/iceberg/pull/15151) to prevent concurrent maintenance operations from conflicting in distributed deployments, with [coordinator lock support accessible from Flink SQL](https://github.com/apache/iceberg/pull/15459). + +Table maintenance also improves in this release: [RewriteDataFiles now supports a dynamic filter](https://github.com/apache/iceberg/pull/15865) for targeted compaction, and [branch support](https://github.com/apache/iceberg/pull/15672) allows running maintenance tasks against a specific branch. + +[Nanosecond timestamp precision](https://github.com/apache/iceberg/pull/15475) is now supported, matching the full range of Iceberg's timestamp spec and meeting the needs of IoT, financial, and scientific workloads. [UUID type support](https://github.com/apache/iceberg/pull/16097) has been added to Avro and Parquet readers and writers. + +#### Hive + +Hive receives a set of correctness and compatibility improvements. [View replacement now updates the query stored in HMS](https://github.com/apache/iceberg/pull/14831), keeping the Hive Metastore in sync when a view's definition changes. [Registering a table now detects if a view with the same name already exists](https://github.com/apache/iceberg/pull/15010), preventing accidental overwrites. The [snapshot procedure now handles tables with Variant columns](https://github.com/apache/iceberg/pull/15964) correctly. + +#### Kafka Connect + +- **Variant type ingestion**: [Semi-structured JSON data](https://github.com/apache/iceberg/pull/15283) can now be ingested directly into Iceberg's Variant type +- **Table UUID validation on commit**: [Verifies table identity at commit time](https://github.com/apache/iceberg/pull/14979) to prevent writing to stale or replaced tables +- **SMT offset tracking fix**: [Fixes a bug](https://github.com/apache/iceberg/pull/15880) where Single Message Transforms that modified the record topic could cause duplicate records + +### Vendor Integrations + +Several cloud provider integrations receive new capabilities and authentication improvements in this release. + +**AWS**: The [KMS endpoint is now configurable via the `kms.endpoint` property](https://github.com/apache/iceberg/pull/14246), allowing connections to custom or private KMS endpoints. [Chunked encoding for S3 requests is now configurable](https://github.com/apache/iceberg/pull/15242), useful for workloads that need to control how data is streamed to S3. When a custom credential provider is configured, [it is now preferred over the default provider chain](https://github.com/apache/iceberg/pull/15249), giving users explicit control over authentication order. [Proxy configuration can now be set via system properties and environment variables](https://github.com/apache/iceberg/pull/15506) for HTTP clients. [Retry policies are now applied to Glue and DynamoDB clients](https://github.com/apache/iceberg/pull/15094), improving reliability in high-throughput catalog operations. + +**Azure**: A [KeyManagementClient implementation for Azure Key Vault](https://github.com/apache/iceberg/pull/13186) adds native Azure KMS support for Iceberg's envelope encryption. [Token credential provider support](https://github.com/apache/iceberg/pull/14136) allows specifying a custom token credential provider class, enabling flexible authentication with Azure services including managed identity and service principal flows. + +**GCP**: [Service account impersonation is now supported in BigQueryMetastoreCatalog](https://github.com/apache/iceberg/pull/14447), allowing a base service account to impersonate another when accessing BigQuery metadata. The new [`gcp.auth.credentials-key` property](https://github.com/apache/iceberg/pull/14713) allows passing a base64-encoded service account key directly in catalog configuration. [BigQueryMetastoreCatalog now uses ETag-based conflict detection](https://github.com/apache/iceberg/pull/14940) when committing table updates, eliminating a redundant table load and preventing lost updates under concurrent writes. Review Comment: ```suggestion **GCP**: [Service account impersonation is now supported in BigQueryMetastoreCatalog](https://github.com/apache/iceberg/pull/14447), allowing a base service account to impersonate another when accessing BigQuery metadata. The new [`gcp.auth.credentials-key` property](https://github.com/apache/iceberg/pull/14713) allows passing a base64-encoded service account key directly in catalog configuration. [BigQueryMetastoreCatalog now uses ETag-based conflict detection](https://github.com/apache/iceberg/pull/14940) when committing table updates, eliminating a redundant table load and preventing lost updates under concurrent writes. Added [GCS Analytics Core](https://github.com/apache/iceberg/pull/14333) that optimizes analytics workloads on Google Cloud Storage with Parquet footer prefetching, small-object caching, and parallel vectored reads. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
