Re: [PR] Add Flink 2.3.0 release [flink-web]

via GitHub Mon, 01 Jun 2026 06:23:16 -0700


alpinegizmo commented on code in PR #853:
URL: https://github.com/apache/flink-web/pull/853#discussion_r3334441330



##########
docs/content/posts/2026-06-10-release-2.3.0.md:
##########
@@ -0,0 +1,308 @@
+---
+title: "Apache Flink 2.3.0: Enhanced SQL Capabilities, Native S3 Support, 
Improved Performance, and Application Management"
+date: "2026-06-10T00:00:00.000Z"
+aliases:
+- /news/2026/06/10/release-2.3.0.html
+authors:
+- flink:
+  name: "Apache Flink PMC"
+
+---
+
+The Apache Flink PMC is pleased to announce the release of Apache Flink 2.3.0.
+
+This release significantly expands SQL capabilities with changelog conversion 
operators, enhances materialized table flexibility, introduces an experimental, 
high-performance native S3 filesystem, and delivers application management. 
Flink 2.3.0 brings together contributors from around the globe, implements full 
or core functionalities of 15 FLIPs (Flink Improvement Proposals), and resolves 
numerous issues and enhancements.
+
+Key improvements in this release include new SQL operators for changelog 
manipulation (`FROM_CHANGELOG` and `TO_CHANGELOG`), fine-grained control over 
materialized table refresh strategies, adaptive partition selection for 
optimized backpressure handling, and a completely redesigned S3 filesystem 
built on AWS SDK v2. The introduction of application-level lifecycle management 
provides better visibility and control for production deployments, while 
enhanced watermark alignment can dramatically improve backlog processing 
performance. A reworked SinkUpsertMaterializer brings much improved performance 
for some Flink SQL workloads. 
+
+We extend our heartfelt thanks to all contributors for making this release 
possible!
+
+Let's dive into the details.
+
+# Flink SQL Improvements
+
+## FROM_CHANGELOG and TO_CHANGELOG: Bridging Append-only and Dynamic Changelog 
Tables
+
+The DataStream API has long offered `toChangelogStream()` and 
`fromChangelogStream()` for working
+with changelog streams; Flink 2.3 brings equivalent functionality to SQL via 
two new built-in
+Process Table Functions:
+
+- `FROM_CHANGELOG` converts an append-only stream that carries an operation 
column into a dynamic
+  table. A configurable `op_mapping` makes it straightforward to plug in 
custom CDC formats and
+  controls how rows with unmapped operation codes are treated.
+- `TO_CHANGELOG` is the inverse: it materializes a dynamic table back into an 
append-only
+  changelog stream. This is the first SQL-level operator that lets users 
convert retract or
+  upsert streams into append form — useful for archival, audit, writing to 
append-only sinks,
+  and working around pipelines that require an append-only table.
+
+The 2.3 release covers limited basic use cases for both. Future versions will 
extend both
+functions with `PARTITION BY`, `invalid_op_handling`, `produces_full_deletes` 
and more to make
+both features powerful and extensive. 
+
+**More Information**
+* [FLIP-564](https://cwiki.apache.org/confluence/x/34k8G)
+
+## Materialized Table Evolution: DDL Parity and Refresh Control
+
+Flink 2.3 brings materialized tables to feature parity with regular tables 
through two major enhancements.
+
+First, `CREATE MATERIALIZED TABLE` now accepts explicit column definitions, 
including watermarks and primary keys, just like regular tables. `ALTER 
MATERIALIZED TABLE` gains full DDL capabilities—`ADD`, `MODIFY`, and `DROP` 
operations for metadata and computed columns, plus `RENAME TO`, allowing 
materialized tables to evolve through the same workflow already used for 
regular Flink tables.
+
+Second, Flink 2.3 introduces granular control over data reprocessing when a 
materialized table's query changes. The new `START_MODE` clause lets you choose 
exactly where the refresh pipeline begins. There is also special support for 
attempting to resume processing from the exact source offsets where the 
previous job instance stopped.
+
+These enhancements eliminate the need to drop and recreate materialized tables 
when query definitions evolve, and prevent unnecessary reprocessing of 
historical data when iterating on pipeline logic. While full reprocessing is 
sometimes unavoidable, specifically when the query optimizer generates a 
completely new physical plan, forcing this behavior for every evolution is both 
unnecessary and costly. For many common evolutions, such as adding a nullable 
column or making compatible logic changes, re-ingesting historical data can now 
be avoided.
+
+**More Information**
+* [FLIP-550](https://cwiki.apache.org/confluence/x/XwobFw)
+* [FLIP-557](https://cwiki.apache.org/confluence/x/9oPMFw)
+
+## SinkUpsertMaterializer: Explicit Conflict Handling
+
+The SinkUpsertMaterializer is required when the upsert key (the unique 
identifier provided by the stream) is different from the primary key (the 
unique identifier in the target sink table). This happens in scenarios like 
multi-stage transformations, projections, or joins.
+
+Previously, the SinkUpsertMaterializer has had poor performance and high 
resource consumption because it keeps the full history of updates for the 
primary key, leading to unbounded state growth. Flink 2.3 addresses this with 
two key improvements.
+
+By default, queries now fail at planning time when upsert and primary keys 
differ, requiring you to explicitly choose a conflict strategy. This is done 
with a new `ON CONFLICT` clause that makes the behavior explicit. You choose 
how to handle conflicts: `DO NOTHING` (silent skip), `DO ERROR` (fail the job), 
or `DO DEDUPLICATE` (materialize and deduplicate, similar to what Flink has 
done until now):
+
+```sql
+INSERT INTO target_table
+SELECT * FROM source
+ON CONFLICT DO DEDUPLICATE;
+```
+
+Second, watermark-based compaction reduces state size by cleaning up old 
changelog records that can no longer affect the final result. Two new 
configuration options control the compaction behavior:
+
+  - `table.exec.sink.upserts.compaction-mode` (default: `WATERMARK`) — 
`WATERMARK` or
+    `CHECKPOINT`.
+  - `table.exec.sink.upserts.compaction-interval` — optional fallback interval 
for emitting
+    watermarks when none arrive naturally.
+
+**More Information**
+* [FLIP-558](https://cwiki.apache.org/confluence/x/NoTMFw)
+
+## Process Table Function Enhancements
+
+Process Table Functions (PTFs), introduced in Flink 2.1, gain new capabilities 
that align them with the DataStream API:
+
+- **Late data handling**: PTFs can now react to late records instead of 
silently dropping them, enabling custom late data strategies at the SQL level.
+- **Ordered table arguments**: The new `ORDER BY` clause on table arguments 
ensures PTFs receive rows in deterministic temporal order within each partition:
+
+```sql
+SELECT * FROM 
+  MyTimestampedPtf(
+    input => TABLE events PARTITION BY user_id ORDER BY event_time
+  );
+```
+
+Flink 2.3 includes only these enhancements that enable more sophisticated 
temporal processing patterns and late data handling directly in SQL. The other 
features described in FLIP-565, around improved state access and broadcast 
state, are expected to follow in a future release.
+
+**More Information**
+* [FLIP-565](https://cwiki.apache.org/confluence/x/qIo8G)
+
+## ARTIFACT Keyword for User-Defined Functions
+
+The `USING` clause of `CREATE FUNCTION` now accepts an `ARTIFACT` keyword as a 
future-proof alternative to `JAR`. This generic keyword prepares the syntax for 
future ecosystem assets like Python wheels, while remaining fully backward 
compatible:
+
+```sql
+-- New syntax
+CREATE FUNCTION my_func AS 'com.example.MyUdf'
+  USING ARTIFACT 's3://bucket/path/my-udf.jar';
+
+-- Existing syntax still works
+CREATE FUNCTION my_func AS 'com.example.MyUdf'
+  USING JAR 's3://bucket/path/my-udf.jar';
+```
+
+**More Information**
+* [FLIP-559](https://cwiki.apache.org/confluence/x/64TMFw)
+
+## Critical Bug Fix: MiniBatch Aggregation Record Loss
+
+Flink 2.3 fixes a critical bug in `MiniBatchGroupAggFunction` that could 
silently drop records when mini-batch aggregation was enabled and the planner 
used a `ONE_PHASE` aggregation strategy. The bug occurred when a key's 
mini-batch contained only retractions with no existing state—the function would 
incorrectly return early, dropping all remaining keys in the bundle. This has 
been corrected to ensure all keys are processed.
+
+**More Information**
+* [FLINK-35661](https://issues.apache.org/jira/browse/FLINK-35661)
+
+# Connectors
+
+## Native S3 FileSystem
+
+Flink 2.3 introduces a ground-up S3 Filesystem with `flink-s3-fs-native`, a 
new plugin built directly on AWS SDK v2. This replaces the Hadoop and 
Presto-based connectors with a modern implementation that delivers:
+
+- **Better performance**: See [Benchmarking Native S3 FileSystem 
(flink-s3-fs-native) vs Presto S3 
(flink-s3-fs-presto)](https://cwiki.apache.org/confluence/x/7Ig8G) for details
+- **Native AWS integration**: IAM Roles for Service Accounts (IRSA), modern 
credential providers, and direct SDK v2 integration
+- **Non-blocking I/O**: Asynchronous operations for improved throughput
+- **Unified implementation**: Single plugin provides both `FileSystem` and 
`RecoverableWriter` (exactly-once streaming sinks)
+- **Zero Hadoop dependencies**: No dependency mess with smaller footprint
+
+The plugin registers the standard `s3://` and `s3a://` scheme and uses a new 
`s3.*` configuration namespace:
+
+```yaml
+s3.region: us-west-2
+s3.endpoint: https://s3.us-west-2.amazonaws.com
+s3.path-style-access: false
+s3.upload.min.part.size: 5MB
+s3.upload.max.concurrent.uploads: 4
+s3.async.enabled: true
+s3.read.buffer.size: 64KB
+s3.bulk-copy.enabled: true
+```
+
+Additional configuration supports SSE-KMS encryption, chunked encoding, 
checksum validation, and entropy injection for high-throughput workloads.
+
+Note: The Native S3 FileSystem is experimental in Flink 2.3. It is 
functionally complete and has demonstrated strong performance in benchmarks, 
but more experience is needed before it can be recommended for use in 
production.

Review Comment:
   @Samrat002 Does this sound right to you?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Add Flink 2.3.0 release [flink-web]

Reply via email to