Samrat002 commented on code in PR #853: URL: https://github.com/apache/flink-web/pull/853#discussion_r3332040851
########## docs/content/posts/2026-06-10-release-2.3.0.md: ########## @@ -0,0 +1,308 @@ +--- +title: "Apache Flink 2.3.0: Enhanced SQL Capabilities, Native S3 Support, Improved Performance, and Enterprise-Grade Application Management" +date: "2026-06-10T00:00:00.000Z" +aliases: +- /news/2026/06/10/release-2.3.0.html +authors: +- flink: + name: "Apache Flink PMC" + +--- + +The Apache Flink PMC is pleased to announce the release of Apache Flink 2.3.0. + +This release significantly expands SQL capabilities with changelog conversion operators, enhances materialized table flexibility, introduces an experimental, high-performance native S3 filesystem, and delivers enterprise-grade application management. Flink 2.3.0 brings together contributors from around the globe, implements 15 FLIPs (Flink Improvement Proposals), and resolves numerous issues and enhancements. + +Key improvements in this release include new SQL operators for changelog manipulation (`FROM_CHANGELOG` and `TO_CHANGELOG`), fine-grained control over materialized table refresh strategies, adaptive partition selection for optimized backpressure handling, and a completely redesigned S3 filesystem built on AWS SDK v2. The introduction of application-level lifecycle management provides better visibility and control for production deployments, while enhanced watermark alignment can dramatically improve backlog processing performance. A reworked SinkUpsertMaterializer brings much improved performance for some Flink SQL workloads. + +We extend our heartfelt thanks to all contributors for making this release possible! + +Let's dive into the details. + +# Flink SQL Improvements + +## FROM_CHANGELOG and TO_CHANGELOG: Bridging Append-only and Dynamic Changelog Tables (FLIP-564) + +The DataStream API has long offered `toChangelogStream()` and `fromChangelogStream()` for working +with changelog streams; Flink 2.3 brings equivalent functionality to SQL via two new built-in +Process Table Functions: + +- `FROM_CHANGELOG` converts an append-only stream that carries an operation column into a dynamic + table. A configurable `op_mapping` makes it straightforward to plug in custom CDC formats and + controls how rows with unmapped operation codes are treated. +- `TO_CHANGELOG` is the inverse: it materializes a dynamic table back into an append-only + changelog stream. This is the first SQL-level operator that lets users convert retract or + upsert streams into append form — useful for archival, audit, writing to append-only sinks, + and working around pipelines that require an append-only table. + +The 2.3 release covers limited basic use cases for both. Future versions will extend both +functions with `PARTITION BY`, `invalid_op_handling`, `produces_full_deletes` and more to make +both features powerful and extensive. + +**More Information** +* [FLIP-564](https://cwiki.apache.org/confluence/x/34k8G) + +## Materialized Table Evolution: DDL Parity and Refresh Control (FLIP-550 and FLIP-557) + +Flink 2.3 brings materialized tables to feature parity with regular tables through two major enhancements. + +First, `CREATE MATERIALIZED TABLE` now accepts explicit column definitions, including watermarks and primary keys, just like regular tables. `ALTER MATERIALIZED TABLE` gains full DDL capabilities—`ADD`, `MODIFY`, and `DROP` operations for metadata and computed columns, plus `RENAME TO`, allowing materialized tables to evolve through the same workflow already used for regular Flink tables. + +Second, Flink 2.3 introduces granular control over data reprocessing when a materialized table's query changes. The new `START_MODE` clause lets you choose exactly where the refresh pipeline begins. There is also special support for attempting to resume processing from the exact source offsets where the previous job instance stopped. + +These enhancements eliminate the need to drop and recreate materialized tables when query definitions evolve, and prevent unnecessary reprocessing of historical data when iterating on pipeline logic. While full reprocessing is sometimes unavoidable, specifically when the query optimizer generates a completely new physical plan, forcing this behavior for every evolution is both unnecessary and costly. For many common evolutions, such as adding a nullable column or making compatible logic changes, re-ingesting historical data can now be avoided. + +**More Information** +* [FLIP-550](https://cwiki.apache.org/confluence/x/XwobFw) +* [FLIP-557](https://cwiki.apache.org/confluence/x/9oPMFw) + +## SinkUpsertMaterializer: Explicit Conflict Handling (FLIP-558) + +The SinkUpsertMaterializer is required when the upsert key (the unique identifier provided by the stream) is different from the primary key (the unique identifier in the target sink table). This happens in scenarios like multi-stage transformations, projections, or joins. + +Previously, the SinkUpsertMaterializer has had poor performance and high resource consumption because it keeps the full history of updates for the primary key, leading to unbounded state growth. Flink 2.3 addresses this with two key improvements. + +By default, queries now fail at planning time when upsert and primary keys differ, requiring you to explicitly choose a conflict strategy. This is done with a new `ON CONFLICT` clause that makes the behavior explicit. You choose how to handle conflicts: `DO NOTHING` (silent skip), `DO ERROR` (fail the job), or `DO DEDUPLICATE` (materialize and deduplicate, similar to what Flink has done until now): + +```sql +INSERT INTO target_table +SELECT * FROM source +ON CONFLICT DO DEDUPLICATE; +``` + +Second, watermark-based compaction reduces state size by cleaning up old changelog records that can no longer affect the final result. Two new configuration options control the compaction behavior: + + - `table.exec.sink.upserts.compaction-mode` (default: `WATERMARK`) — `WATERMARK` or + `CHECKPOINT`. + - `table.exec.sink.upserts.compaction-interval` — optional fallback interval for emitting + watermarks when none arrive naturally. + +**More Information** +* [FLIP-558](https://cwiki.apache.org/confluence/x/NoTMFw) + +## Process Table Function Enhancements (FLIP-565) + +Process Table Functions (PTFs), introduced in Flink 2.1, gain new capabilities that align them with the DataStream API: + +- **Late data handling**: PTFs can now react to late records instead of silently dropping them, enabling custom late data strategies at the SQL level. +- **Ordered table arguments**: The new `ORDER BY` clause on table arguments ensures PTFs receive rows in deterministic temporal order within each partition: + +```sql +SELECT * FROM + MyTimestampedPtf( + input => TABLE events PARTITION BY user_id ORDER BY event_time + ); +``` + +These enhancements enable more sophisticated temporal processing patterns directly in SQL. + +**More Information** +* [FLIP-565](https://cwiki.apache.org/confluence/x/qIo8G) + +## ARTIFACT Keyword for User-Defined Functions (FLIP-559) + +The `USING` clause of `CREATE FUNCTION` now accepts an `ARTIFACT` keyword as a future-proof alternative to `JAR`. This generic keyword prepares the syntax for future ecosystem assets like Python wheels, while remaining fully backward compatible: + +```sql +-- New syntax +CREATE FUNCTION my_func AS 'com.example.MyUdf' + USING ARTIFACT 's3://bucket/path/my-udf.jar'; + +-- Existing syntax still works +CREATE FUNCTION my_func AS 'com.example.MyUdf' + USING JAR 's3://bucket/path/my-udf.jar'; +``` + +**More Information** +* [FLIP-559](https://cwiki.apache.org/confluence/x/64TMFw) + +## Critical Bug Fix: MiniBatch Aggregation Record Loss + +Flink 2.3 fixes a critical bug in `MiniBatchGroupAggFunction` that could silently drop records when mini-batch aggregation was enabled and the planner used a `ONE_PHASE` aggregation strategy. The bug occurred when a key's mini-batch contained only retractions with no existing state—the function would incorrectly return early, dropping all remaining keys in the bundle. This has been corrected to ensure all keys are processed. + +**More Information** +* [FLINK-35661](https://issues.apache.org/jira/browse/FLINK-35661) + +# Connectors + +## Native S3 FileSystem + +Flink 2.3 introduces a ground-up rewrite of S3 connectivity with `flink-s3-fs-native`, a new plugin built directly on AWS SDK v2. This replaces the Hadoop and Presto-based connectors with a modern implementation that delivers: Review Comment: ```suggestion Flink 2.3 introduces a ground-up S3 Filesystem with `flink-s3-fs-native`, a new plugin built directly on AWS SDK v2. This replaces the Hadoop and Presto-based connectors with a modern implementation that delivers: ``` ########## docs/content/posts/2026-06-10-release-2.3.0.md: ########## @@ -0,0 +1,308 @@ +--- +title: "Apache Flink 2.3.0: Enhanced SQL Capabilities, Native S3 Support, Improved Performance, and Enterprise-Grade Application Management" +date: "2026-06-10T00:00:00.000Z" +aliases: +- /news/2026/06/10/release-2.3.0.html +authors: +- flink: + name: "Apache Flink PMC" + +--- + +The Apache Flink PMC is pleased to announce the release of Apache Flink 2.3.0. + +This release significantly expands SQL capabilities with changelog conversion operators, enhances materialized table flexibility, introduces an experimental, high-performance native S3 filesystem, and delivers enterprise-grade application management. Flink 2.3.0 brings together contributors from around the globe, implements 15 FLIPs (Flink Improvement Proposals), and resolves numerous issues and enhancements. + +Key improvements in this release include new SQL operators for changelog manipulation (`FROM_CHANGELOG` and `TO_CHANGELOG`), fine-grained control over materialized table refresh strategies, adaptive partition selection for optimized backpressure handling, and a completely redesigned S3 filesystem built on AWS SDK v2. The introduction of application-level lifecycle management provides better visibility and control for production deployments, while enhanced watermark alignment can dramatically improve backlog processing performance. A reworked SinkUpsertMaterializer brings much improved performance for some Flink SQL workloads. + +We extend our heartfelt thanks to all contributors for making this release possible! + +Let's dive into the details. + +# Flink SQL Improvements + +## FROM_CHANGELOG and TO_CHANGELOG: Bridging Append-only and Dynamic Changelog Tables (FLIP-564) + +The DataStream API has long offered `toChangelogStream()` and `fromChangelogStream()` for working +with changelog streams; Flink 2.3 brings equivalent functionality to SQL via two new built-in +Process Table Functions: + +- `FROM_CHANGELOG` converts an append-only stream that carries an operation column into a dynamic + table. A configurable `op_mapping` makes it straightforward to plug in custom CDC formats and + controls how rows with unmapped operation codes are treated. +- `TO_CHANGELOG` is the inverse: it materializes a dynamic table back into an append-only + changelog stream. This is the first SQL-level operator that lets users convert retract or + upsert streams into append form — useful for archival, audit, writing to append-only sinks, + and working around pipelines that require an append-only table. + +The 2.3 release covers limited basic use cases for both. Future versions will extend both +functions with `PARTITION BY`, `invalid_op_handling`, `produces_full_deletes` and more to make +both features powerful and extensive. + +**More Information** +* [FLIP-564](https://cwiki.apache.org/confluence/x/34k8G) + +## Materialized Table Evolution: DDL Parity and Refresh Control (FLIP-550 and FLIP-557) + +Flink 2.3 brings materialized tables to feature parity with regular tables through two major enhancements. + +First, `CREATE MATERIALIZED TABLE` now accepts explicit column definitions, including watermarks and primary keys, just like regular tables. `ALTER MATERIALIZED TABLE` gains full DDL capabilities—`ADD`, `MODIFY`, and `DROP` operations for metadata and computed columns, plus `RENAME TO`, allowing materialized tables to evolve through the same workflow already used for regular Flink tables. + +Second, Flink 2.3 introduces granular control over data reprocessing when a materialized table's query changes. The new `START_MODE` clause lets you choose exactly where the refresh pipeline begins. There is also special support for attempting to resume processing from the exact source offsets where the previous job instance stopped. + +These enhancements eliminate the need to drop and recreate materialized tables when query definitions evolve, and prevent unnecessary reprocessing of historical data when iterating on pipeline logic. While full reprocessing is sometimes unavoidable, specifically when the query optimizer generates a completely new physical plan, forcing this behavior for every evolution is both unnecessary and costly. For many common evolutions, such as adding a nullable column or making compatible logic changes, re-ingesting historical data can now be avoided. + +**More Information** +* [FLIP-550](https://cwiki.apache.org/confluence/x/XwobFw) +* [FLIP-557](https://cwiki.apache.org/confluence/x/9oPMFw) + +## SinkUpsertMaterializer: Explicit Conflict Handling (FLIP-558) + +The SinkUpsertMaterializer is required when the upsert key (the unique identifier provided by the stream) is different from the primary key (the unique identifier in the target sink table). This happens in scenarios like multi-stage transformations, projections, or joins. + +Previously, the SinkUpsertMaterializer has had poor performance and high resource consumption because it keeps the full history of updates for the primary key, leading to unbounded state growth. Flink 2.3 addresses this with two key improvements. + +By default, queries now fail at planning time when upsert and primary keys differ, requiring you to explicitly choose a conflict strategy. This is done with a new `ON CONFLICT` clause that makes the behavior explicit. You choose how to handle conflicts: `DO NOTHING` (silent skip), `DO ERROR` (fail the job), or `DO DEDUPLICATE` (materialize and deduplicate, similar to what Flink has done until now): + +```sql +INSERT INTO target_table +SELECT * FROM source +ON CONFLICT DO DEDUPLICATE; +``` + +Second, watermark-based compaction reduces state size by cleaning up old changelog records that can no longer affect the final result. Two new configuration options control the compaction behavior: + + - `table.exec.sink.upserts.compaction-mode` (default: `WATERMARK`) — `WATERMARK` or + `CHECKPOINT`. + - `table.exec.sink.upserts.compaction-interval` — optional fallback interval for emitting + watermarks when none arrive naturally. + +**More Information** +* [FLIP-558](https://cwiki.apache.org/confluence/x/NoTMFw) + +## Process Table Function Enhancements (FLIP-565) + +Process Table Functions (PTFs), introduced in Flink 2.1, gain new capabilities that align them with the DataStream API: + +- **Late data handling**: PTFs can now react to late records instead of silently dropping them, enabling custom late data strategies at the SQL level. +- **Ordered table arguments**: The new `ORDER BY` clause on table arguments ensures PTFs receive rows in deterministic temporal order within each partition: + +```sql +SELECT * FROM + MyTimestampedPtf( + input => TABLE events PARTITION BY user_id ORDER BY event_time + ); +``` + +These enhancements enable more sophisticated temporal processing patterns directly in SQL. + +**More Information** +* [FLIP-565](https://cwiki.apache.org/confluence/x/qIo8G) + +## ARTIFACT Keyword for User-Defined Functions (FLIP-559) + +The `USING` clause of `CREATE FUNCTION` now accepts an `ARTIFACT` keyword as a future-proof alternative to `JAR`. This generic keyword prepares the syntax for future ecosystem assets like Python wheels, while remaining fully backward compatible: + +```sql +-- New syntax +CREATE FUNCTION my_func AS 'com.example.MyUdf' + USING ARTIFACT 's3://bucket/path/my-udf.jar'; + +-- Existing syntax still works +CREATE FUNCTION my_func AS 'com.example.MyUdf' + USING JAR 's3://bucket/path/my-udf.jar'; +``` + +**More Information** +* [FLIP-559](https://cwiki.apache.org/confluence/x/64TMFw) + +## Critical Bug Fix: MiniBatch Aggregation Record Loss + +Flink 2.3 fixes a critical bug in `MiniBatchGroupAggFunction` that could silently drop records when mini-batch aggregation was enabled and the planner used a `ONE_PHASE` aggregation strategy. The bug occurred when a key's mini-batch contained only retractions with no existing state—the function would incorrectly return early, dropping all remaining keys in the bundle. This has been corrected to ensure all keys are processed. + +**More Information** +* [FLINK-35661](https://issues.apache.org/jira/browse/FLINK-35661) + +# Connectors + +## Native S3 FileSystem + +Flink 2.3 introduces a ground-up rewrite of S3 connectivity with `flink-s3-fs-native`, a new plugin built directly on AWS SDK v2. This replaces the Hadoop and Presto-based connectors with a modern implementation that delivers: + +- **Native AWS integration**: IAM Roles for Service Accounts (IRSA), modern credential providers, and direct SDK v2 integration +- **Non-blocking I/O**: Asynchronous operations for improved throughput +- **Unified implementation**: Single plugin provides both `FileSystem` and `RecoverableWriter` (exactly-once streaming sinks) +- **Zero Hadoop dependencies**: Cleaner deployment with smaller footprint + +The plugin registers the standard `s3://` scheme and uses a new `s3.*` configuration namespace: Review Comment: ```suggestion The plugin registers the standard `s3://` and `s3a://` scheme and uses a new `s3.*` configuration namespace: ``` ########## docs/content/posts/2026-06-10-release-2.3.0.md: ########## @@ -0,0 +1,308 @@ +--- +title: "Apache Flink 2.3.0: Enhanced SQL Capabilities, Native S3 Support, Improved Performance, and Enterprise-Grade Application Management" +date: "2026-06-10T00:00:00.000Z" +aliases: +- /news/2026/06/10/release-2.3.0.html +authors: +- flink: + name: "Apache Flink PMC" + +--- + +The Apache Flink PMC is pleased to announce the release of Apache Flink 2.3.0. + +This release significantly expands SQL capabilities with changelog conversion operators, enhances materialized table flexibility, introduces an experimental, high-performance native S3 filesystem, and delivers enterprise-grade application management. Flink 2.3.0 brings together contributors from around the globe, implements 15 FLIPs (Flink Improvement Proposals), and resolves numerous issues and enhancements. + +Key improvements in this release include new SQL operators for changelog manipulation (`FROM_CHANGELOG` and `TO_CHANGELOG`), fine-grained control over materialized table refresh strategies, adaptive partition selection for optimized backpressure handling, and a completely redesigned S3 filesystem built on AWS SDK v2. The introduction of application-level lifecycle management provides better visibility and control for production deployments, while enhanced watermark alignment can dramatically improve backlog processing performance. A reworked SinkUpsertMaterializer brings much improved performance for some Flink SQL workloads. + +We extend our heartfelt thanks to all contributors for making this release possible! + +Let's dive into the details. + +# Flink SQL Improvements + +## FROM_CHANGELOG and TO_CHANGELOG: Bridging Append-only and Dynamic Changelog Tables (FLIP-564) + +The DataStream API has long offered `toChangelogStream()` and `fromChangelogStream()` for working +with changelog streams; Flink 2.3 brings equivalent functionality to SQL via two new built-in +Process Table Functions: + +- `FROM_CHANGELOG` converts an append-only stream that carries an operation column into a dynamic + table. A configurable `op_mapping` makes it straightforward to plug in custom CDC formats and + controls how rows with unmapped operation codes are treated. +- `TO_CHANGELOG` is the inverse: it materializes a dynamic table back into an append-only + changelog stream. This is the first SQL-level operator that lets users convert retract or + upsert streams into append form — useful for archival, audit, writing to append-only sinks, + and working around pipelines that require an append-only table. + +The 2.3 release covers limited basic use cases for both. Future versions will extend both +functions with `PARTITION BY`, `invalid_op_handling`, `produces_full_deletes` and more to make +both features powerful and extensive. + +**More Information** +* [FLIP-564](https://cwiki.apache.org/confluence/x/34k8G) + +## Materialized Table Evolution: DDL Parity and Refresh Control (FLIP-550 and FLIP-557) + +Flink 2.3 brings materialized tables to feature parity with regular tables through two major enhancements. + +First, `CREATE MATERIALIZED TABLE` now accepts explicit column definitions, including watermarks and primary keys, just like regular tables. `ALTER MATERIALIZED TABLE` gains full DDL capabilities—`ADD`, `MODIFY`, and `DROP` operations for metadata and computed columns, plus `RENAME TO`, allowing materialized tables to evolve through the same workflow already used for regular Flink tables. + +Second, Flink 2.3 introduces granular control over data reprocessing when a materialized table's query changes. The new `START_MODE` clause lets you choose exactly where the refresh pipeline begins. There is also special support for attempting to resume processing from the exact source offsets where the previous job instance stopped. + +These enhancements eliminate the need to drop and recreate materialized tables when query definitions evolve, and prevent unnecessary reprocessing of historical data when iterating on pipeline logic. While full reprocessing is sometimes unavoidable, specifically when the query optimizer generates a completely new physical plan, forcing this behavior for every evolution is both unnecessary and costly. For many common evolutions, such as adding a nullable column or making compatible logic changes, re-ingesting historical data can now be avoided. + +**More Information** +* [FLIP-550](https://cwiki.apache.org/confluence/x/XwobFw) +* [FLIP-557](https://cwiki.apache.org/confluence/x/9oPMFw) + +## SinkUpsertMaterializer: Explicit Conflict Handling (FLIP-558) + +The SinkUpsertMaterializer is required when the upsert key (the unique identifier provided by the stream) is different from the primary key (the unique identifier in the target sink table). This happens in scenarios like multi-stage transformations, projections, or joins. + +Previously, the SinkUpsertMaterializer has had poor performance and high resource consumption because it keeps the full history of updates for the primary key, leading to unbounded state growth. Flink 2.3 addresses this with two key improvements. + +By default, queries now fail at planning time when upsert and primary keys differ, requiring you to explicitly choose a conflict strategy. This is done with a new `ON CONFLICT` clause that makes the behavior explicit. You choose how to handle conflicts: `DO NOTHING` (silent skip), `DO ERROR` (fail the job), or `DO DEDUPLICATE` (materialize and deduplicate, similar to what Flink has done until now): + +```sql +INSERT INTO target_table +SELECT * FROM source +ON CONFLICT DO DEDUPLICATE; +``` + +Second, watermark-based compaction reduces state size by cleaning up old changelog records that can no longer affect the final result. Two new configuration options control the compaction behavior: + + - `table.exec.sink.upserts.compaction-mode` (default: `WATERMARK`) — `WATERMARK` or + `CHECKPOINT`. + - `table.exec.sink.upserts.compaction-interval` — optional fallback interval for emitting + watermarks when none arrive naturally. + +**More Information** +* [FLIP-558](https://cwiki.apache.org/confluence/x/NoTMFw) + +## Process Table Function Enhancements (FLIP-565) + +Process Table Functions (PTFs), introduced in Flink 2.1, gain new capabilities that align them with the DataStream API: + +- **Late data handling**: PTFs can now react to late records instead of silently dropping them, enabling custom late data strategies at the SQL level. +- **Ordered table arguments**: The new `ORDER BY` clause on table arguments ensures PTFs receive rows in deterministic temporal order within each partition: + +```sql +SELECT * FROM + MyTimestampedPtf( + input => TABLE events PARTITION BY user_id ORDER BY event_time + ); +``` + +These enhancements enable more sophisticated temporal processing patterns directly in SQL. + +**More Information** +* [FLIP-565](https://cwiki.apache.org/confluence/x/qIo8G) + +## ARTIFACT Keyword for User-Defined Functions (FLIP-559) + +The `USING` clause of `CREATE FUNCTION` now accepts an `ARTIFACT` keyword as a future-proof alternative to `JAR`. This generic keyword prepares the syntax for future ecosystem assets like Python wheels, while remaining fully backward compatible: + +```sql +-- New syntax +CREATE FUNCTION my_func AS 'com.example.MyUdf' + USING ARTIFACT 's3://bucket/path/my-udf.jar'; + +-- Existing syntax still works +CREATE FUNCTION my_func AS 'com.example.MyUdf' + USING JAR 's3://bucket/path/my-udf.jar'; +``` + +**More Information** +* [FLIP-559](https://cwiki.apache.org/confluence/x/64TMFw) + +## Critical Bug Fix: MiniBatch Aggregation Record Loss + +Flink 2.3 fixes a critical bug in `MiniBatchGroupAggFunction` that could silently drop records when mini-batch aggregation was enabled and the planner used a `ONE_PHASE` aggregation strategy. The bug occurred when a key's mini-batch contained only retractions with no existing state—the function would incorrectly return early, dropping all remaining keys in the bundle. This has been corrected to ensure all keys are processed. + +**More Information** +* [FLINK-35661](https://issues.apache.org/jira/browse/FLINK-35661) + +# Connectors + +## Native S3 FileSystem + +Flink 2.3 introduces a ground-up rewrite of S3 connectivity with `flink-s3-fs-native`, a new plugin built directly on AWS SDK v2. This replaces the Hadoop and Presto-based connectors with a modern implementation that delivers: + +- **Native AWS integration**: IAM Roles for Service Accounts (IRSA), modern credential providers, and direct SDK v2 integration Review Comment: This plugin has shown better performance compared to existing `flink-s3-fs-hadoop` and `flink-s3-fs-presto` . Reference : https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406620396&src=contextnavpagetreemode ########## docs/content/posts/2026-06-10-release-2.3.0.md: ########## @@ -0,0 +1,308 @@ +--- +title: "Apache Flink 2.3.0: Enhanced SQL Capabilities, Native S3 Support, Improved Performance, and Enterprise-Grade Application Management" Review Comment: ```suggestion title: "Apache Flink 2.3.0: Enhanced SQL Capabilities, Native S3 Filesystem Support, Improved Performance, and Application Management" ``` ########## docs/content/posts/2026-06-10-release-2.3.0.md: ########## @@ -0,0 +1,308 @@ +--- +title: "Apache Flink 2.3.0: Enhanced SQL Capabilities, Native S3 Support, Improved Performance, and Enterprise-Grade Application Management" +date: "2026-06-10T00:00:00.000Z" +aliases: +- /news/2026/06/10/release-2.3.0.html +authors: +- flink: + name: "Apache Flink PMC" + +--- + +The Apache Flink PMC is pleased to announce the release of Apache Flink 2.3.0. + +This release significantly expands SQL capabilities with changelog conversion operators, enhances materialized table flexibility, introduces an experimental, high-performance native S3 filesystem, and delivers enterprise-grade application management. Flink 2.3.0 brings together contributors from around the globe, implements 15 FLIPs (Flink Improvement Proposals), and resolves numerous issues and enhancements. + +Key improvements in this release include new SQL operators for changelog manipulation (`FROM_CHANGELOG` and `TO_CHANGELOG`), fine-grained control over materialized table refresh strategies, adaptive partition selection for optimized backpressure handling, and a completely redesigned S3 filesystem built on AWS SDK v2. The introduction of application-level lifecycle management provides better visibility and control for production deployments, while enhanced watermark alignment can dramatically improve backlog processing performance. A reworked SinkUpsertMaterializer brings much improved performance for some Flink SQL workloads. + +We extend our heartfelt thanks to all contributors for making this release possible! + +Let's dive into the details. + +# Flink SQL Improvements + +## FROM_CHANGELOG and TO_CHANGELOG: Bridging Append-only and Dynamic Changelog Tables (FLIP-564) + +The DataStream API has long offered `toChangelogStream()` and `fromChangelogStream()` for working +with changelog streams; Flink 2.3 brings equivalent functionality to SQL via two new built-in +Process Table Functions: + +- `FROM_CHANGELOG` converts an append-only stream that carries an operation column into a dynamic + table. A configurable `op_mapping` makes it straightforward to plug in custom CDC formats and + controls how rows with unmapped operation codes are treated. +- `TO_CHANGELOG` is the inverse: it materializes a dynamic table back into an append-only + changelog stream. This is the first SQL-level operator that lets users convert retract or + upsert streams into append form — useful for archival, audit, writing to append-only sinks, + and working around pipelines that require an append-only table. + +The 2.3 release covers limited basic use cases for both. Future versions will extend both +functions with `PARTITION BY`, `invalid_op_handling`, `produces_full_deletes` and more to make +both features powerful and extensive. + +**More Information** +* [FLIP-564](https://cwiki.apache.org/confluence/x/34k8G) + +## Materialized Table Evolution: DDL Parity and Refresh Control (FLIP-550 and FLIP-557) + +Flink 2.3 brings materialized tables to feature parity with regular tables through two major enhancements. + +First, `CREATE MATERIALIZED TABLE` now accepts explicit column definitions, including watermarks and primary keys, just like regular tables. `ALTER MATERIALIZED TABLE` gains full DDL capabilities—`ADD`, `MODIFY`, and `DROP` operations for metadata and computed columns, plus `RENAME TO`, allowing materialized tables to evolve through the same workflow already used for regular Flink tables. + +Second, Flink 2.3 introduces granular control over data reprocessing when a materialized table's query changes. The new `START_MODE` clause lets you choose exactly where the refresh pipeline begins. There is also special support for attempting to resume processing from the exact source offsets where the previous job instance stopped. + +These enhancements eliminate the need to drop and recreate materialized tables when query definitions evolve, and prevent unnecessary reprocessing of historical data when iterating on pipeline logic. While full reprocessing is sometimes unavoidable, specifically when the query optimizer generates a completely new physical plan, forcing this behavior for every evolution is both unnecessary and costly. For many common evolutions, such as adding a nullable column or making compatible logic changes, re-ingesting historical data can now be avoided. + +**More Information** +* [FLIP-550](https://cwiki.apache.org/confluence/x/XwobFw) +* [FLIP-557](https://cwiki.apache.org/confluence/x/9oPMFw) + +## SinkUpsertMaterializer: Explicit Conflict Handling (FLIP-558) + +The SinkUpsertMaterializer is required when the upsert key (the unique identifier provided by the stream) is different from the primary key (the unique identifier in the target sink table). This happens in scenarios like multi-stage transformations, projections, or joins. + +Previously, the SinkUpsertMaterializer has had poor performance and high resource consumption because it keeps the full history of updates for the primary key, leading to unbounded state growth. Flink 2.3 addresses this with two key improvements. + +By default, queries now fail at planning time when upsert and primary keys differ, requiring you to explicitly choose a conflict strategy. This is done with a new `ON CONFLICT` clause that makes the behavior explicit. You choose how to handle conflicts: `DO NOTHING` (silent skip), `DO ERROR` (fail the job), or `DO DEDUPLICATE` (materialize and deduplicate, similar to what Flink has done until now): + +```sql +INSERT INTO target_table +SELECT * FROM source +ON CONFLICT DO DEDUPLICATE; +``` + +Second, watermark-based compaction reduces state size by cleaning up old changelog records that can no longer affect the final result. Two new configuration options control the compaction behavior: + + - `table.exec.sink.upserts.compaction-mode` (default: `WATERMARK`) — `WATERMARK` or + `CHECKPOINT`. + - `table.exec.sink.upserts.compaction-interval` — optional fallback interval for emitting + watermarks when none arrive naturally. + +**More Information** +* [FLIP-558](https://cwiki.apache.org/confluence/x/NoTMFw) + +## Process Table Function Enhancements (FLIP-565) + +Process Table Functions (PTFs), introduced in Flink 2.1, gain new capabilities that align them with the DataStream API: + +- **Late data handling**: PTFs can now react to late records instead of silently dropping them, enabling custom late data strategies at the SQL level. +- **Ordered table arguments**: The new `ORDER BY` clause on table arguments ensures PTFs receive rows in deterministic temporal order within each partition: + +```sql +SELECT * FROM + MyTimestampedPtf( + input => TABLE events PARTITION BY user_id ORDER BY event_time + ); +``` + +These enhancements enable more sophisticated temporal processing patterns directly in SQL. + +**More Information** +* [FLIP-565](https://cwiki.apache.org/confluence/x/qIo8G) + +## ARTIFACT Keyword for User-Defined Functions (FLIP-559) + +The `USING` clause of `CREATE FUNCTION` now accepts an `ARTIFACT` keyword as a future-proof alternative to `JAR`. This generic keyword prepares the syntax for future ecosystem assets like Python wheels, while remaining fully backward compatible: + +```sql +-- New syntax +CREATE FUNCTION my_func AS 'com.example.MyUdf' + USING ARTIFACT 's3://bucket/path/my-udf.jar'; + +-- Existing syntax still works +CREATE FUNCTION my_func AS 'com.example.MyUdf' + USING JAR 's3://bucket/path/my-udf.jar'; +``` + +**More Information** +* [FLIP-559](https://cwiki.apache.org/confluence/x/64TMFw) + +## Critical Bug Fix: MiniBatch Aggregation Record Loss + +Flink 2.3 fixes a critical bug in `MiniBatchGroupAggFunction` that could silently drop records when mini-batch aggregation was enabled and the planner used a `ONE_PHASE` aggregation strategy. The bug occurred when a key's mini-batch contained only retractions with no existing state—the function would incorrectly return early, dropping all remaining keys in the bundle. This has been corrected to ensure all keys are processed. + +**More Information** +* [FLINK-35661](https://issues.apache.org/jira/browse/FLINK-35661) + +# Connectors + +## Native S3 FileSystem + +Flink 2.3 introduces a ground-up rewrite of S3 connectivity with `flink-s3-fs-native`, a new plugin built directly on AWS SDK v2. This replaces the Hadoop and Presto-based connectors with a modern implementation that delivers: Review Comment: This plugin is drop in replacement for Hadoop and Presto-based connectors ########## docs/content/posts/2026-06-10-release-2.3.0.md: ########## @@ -0,0 +1,308 @@ +--- +title: "Apache Flink 2.3.0: Enhanced SQL Capabilities, Native S3 Support, Improved Performance, and Enterprise-Grade Application Management" +date: "2026-06-10T00:00:00.000Z" +aliases: +- /news/2026/06/10/release-2.3.0.html +authors: +- flink: + name: "Apache Flink PMC" + +--- + +The Apache Flink PMC is pleased to announce the release of Apache Flink 2.3.0. + +This release significantly expands SQL capabilities with changelog conversion operators, enhances materialized table flexibility, introduces an experimental, high-performance native S3 filesystem, and delivers enterprise-grade application management. Flink 2.3.0 brings together contributors from around the globe, implements 15 FLIPs (Flink Improvement Proposals), and resolves numerous issues and enhancements. + +Key improvements in this release include new SQL operators for changelog manipulation (`FROM_CHANGELOG` and `TO_CHANGELOG`), fine-grained control over materialized table refresh strategies, adaptive partition selection for optimized backpressure handling, and a completely redesigned S3 filesystem built on AWS SDK v2. The introduction of application-level lifecycle management provides better visibility and control for production deployments, while enhanced watermark alignment can dramatically improve backlog processing performance. A reworked SinkUpsertMaterializer brings much improved performance for some Flink SQL workloads. + +We extend our heartfelt thanks to all contributors for making this release possible! + +Let's dive into the details. + +# Flink SQL Improvements + +## FROM_CHANGELOG and TO_CHANGELOG: Bridging Append-only and Dynamic Changelog Tables (FLIP-564) + +The DataStream API has long offered `toChangelogStream()` and `fromChangelogStream()` for working +with changelog streams; Flink 2.3 brings equivalent functionality to SQL via two new built-in +Process Table Functions: + +- `FROM_CHANGELOG` converts an append-only stream that carries an operation column into a dynamic + table. A configurable `op_mapping` makes it straightforward to plug in custom CDC formats and + controls how rows with unmapped operation codes are treated. +- `TO_CHANGELOG` is the inverse: it materializes a dynamic table back into an append-only + changelog stream. This is the first SQL-level operator that lets users convert retract or + upsert streams into append form — useful for archival, audit, writing to append-only sinks, + and working around pipelines that require an append-only table. + +The 2.3 release covers limited basic use cases for both. Future versions will extend both +functions with `PARTITION BY`, `invalid_op_handling`, `produces_full_deletes` and more to make +both features powerful and extensive. + +**More Information** +* [FLIP-564](https://cwiki.apache.org/confluence/x/34k8G) + +## Materialized Table Evolution: DDL Parity and Refresh Control (FLIP-550 and FLIP-557) + +Flink 2.3 brings materialized tables to feature parity with regular tables through two major enhancements. + +First, `CREATE MATERIALIZED TABLE` now accepts explicit column definitions, including watermarks and primary keys, just like regular tables. `ALTER MATERIALIZED TABLE` gains full DDL capabilities—`ADD`, `MODIFY`, and `DROP` operations for metadata and computed columns, plus `RENAME TO`, allowing materialized tables to evolve through the same workflow already used for regular Flink tables. + +Second, Flink 2.3 introduces granular control over data reprocessing when a materialized table's query changes. The new `START_MODE` clause lets you choose exactly where the refresh pipeline begins. There is also special support for attempting to resume processing from the exact source offsets where the previous job instance stopped. + +These enhancements eliminate the need to drop and recreate materialized tables when query definitions evolve, and prevent unnecessary reprocessing of historical data when iterating on pipeline logic. While full reprocessing is sometimes unavoidable, specifically when the query optimizer generates a completely new physical plan, forcing this behavior for every evolution is both unnecessary and costly. For many common evolutions, such as adding a nullable column or making compatible logic changes, re-ingesting historical data can now be avoided. + +**More Information** +* [FLIP-550](https://cwiki.apache.org/confluence/x/XwobFw) +* [FLIP-557](https://cwiki.apache.org/confluence/x/9oPMFw) + +## SinkUpsertMaterializer: Explicit Conflict Handling (FLIP-558) + +The SinkUpsertMaterializer is required when the upsert key (the unique identifier provided by the stream) is different from the primary key (the unique identifier in the target sink table). This happens in scenarios like multi-stage transformations, projections, or joins. + +Previously, the SinkUpsertMaterializer has had poor performance and high resource consumption because it keeps the full history of updates for the primary key, leading to unbounded state growth. Flink 2.3 addresses this with two key improvements. + +By default, queries now fail at planning time when upsert and primary keys differ, requiring you to explicitly choose a conflict strategy. This is done with a new `ON CONFLICT` clause that makes the behavior explicit. You choose how to handle conflicts: `DO NOTHING` (silent skip), `DO ERROR` (fail the job), or `DO DEDUPLICATE` (materialize and deduplicate, similar to what Flink has done until now): + +```sql +INSERT INTO target_table +SELECT * FROM source +ON CONFLICT DO DEDUPLICATE; +``` + +Second, watermark-based compaction reduces state size by cleaning up old changelog records that can no longer affect the final result. Two new configuration options control the compaction behavior: + + - `table.exec.sink.upserts.compaction-mode` (default: `WATERMARK`) — `WATERMARK` or + `CHECKPOINT`. + - `table.exec.sink.upserts.compaction-interval` — optional fallback interval for emitting + watermarks when none arrive naturally. + +**More Information** +* [FLIP-558](https://cwiki.apache.org/confluence/x/NoTMFw) + +## Process Table Function Enhancements (FLIP-565) + +Process Table Functions (PTFs), introduced in Flink 2.1, gain new capabilities that align them with the DataStream API: + +- **Late data handling**: PTFs can now react to late records instead of silently dropping them, enabling custom late data strategies at the SQL level. +- **Ordered table arguments**: The new `ORDER BY` clause on table arguments ensures PTFs receive rows in deterministic temporal order within each partition: + +```sql +SELECT * FROM + MyTimestampedPtf( + input => TABLE events PARTITION BY user_id ORDER BY event_time + ); +``` + +These enhancements enable more sophisticated temporal processing patterns directly in SQL. + +**More Information** +* [FLIP-565](https://cwiki.apache.org/confluence/x/qIo8G) + +## ARTIFACT Keyword for User-Defined Functions (FLIP-559) + +The `USING` clause of `CREATE FUNCTION` now accepts an `ARTIFACT` keyword as a future-proof alternative to `JAR`. This generic keyword prepares the syntax for future ecosystem assets like Python wheels, while remaining fully backward compatible: + +```sql +-- New syntax +CREATE FUNCTION my_func AS 'com.example.MyUdf' + USING ARTIFACT 's3://bucket/path/my-udf.jar'; + +-- Existing syntax still works +CREATE FUNCTION my_func AS 'com.example.MyUdf' + USING JAR 's3://bucket/path/my-udf.jar'; +``` + +**More Information** +* [FLIP-559](https://cwiki.apache.org/confluence/x/64TMFw) + +## Critical Bug Fix: MiniBatch Aggregation Record Loss + +Flink 2.3 fixes a critical bug in `MiniBatchGroupAggFunction` that could silently drop records when mini-batch aggregation was enabled and the planner used a `ONE_PHASE` aggregation strategy. The bug occurred when a key's mini-batch contained only retractions with no existing state—the function would incorrectly return early, dropping all remaining keys in the bundle. This has been corrected to ensure all keys are processed. + +**More Information** +* [FLINK-35661](https://issues.apache.org/jira/browse/FLINK-35661) + +# Connectors + +## Native S3 FileSystem + +Flink 2.3 introduces a ground-up rewrite of S3 connectivity with `flink-s3-fs-native`, a new plugin built directly on AWS SDK v2. This replaces the Hadoop and Presto-based connectors with a modern implementation that delivers: + +- **Native AWS integration**: IAM Roles for Service Accounts (IRSA), modern credential providers, and direct SDK v2 integration +- **Non-blocking I/O**: Asynchronous operations for improved throughput +- **Unified implementation**: Single plugin provides both `FileSystem` and `RecoverableWriter` (exactly-once streaming sinks) +- **Zero Hadoop dependencies**: Cleaner deployment with smaller footprint Review Comment: ```suggestion - **Zero Hadoop dependencies**: No dependency mess with smaller footprint ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
