Re: [PR] Add Flink 2.3.0 release [flink-web]

via GitHub Tue, 02 Jun 2026 15:53:21 -0700


1996fanrui commented on code in PR #853:
URL: https://github.com/apache/flink-web/pull/853#discussion_r3344925843



##########
docs/content/posts/2026-06-10-release-2.3.0.md:
##########
@@ -0,0 +1,308 @@
+---
+title: "Apache Flink 2.3.0: Enhanced SQL Capabilities, Native S3 Support, 
Improved Performance, and Enterprise-Grade Application Management"
+date: "2026-06-10T00:00:00.000Z"
+aliases:
+- /news/2026/06/10/release-2.3.0.html
+authors:
+- flink:
+  name: "Apache Flink PMC"
+
+---
+
+The Apache Flink PMC is pleased to announce the release of Apache Flink 2.3.0.
+
+This release significantly expands SQL capabilities with changelog conversion 
operators, enhances materialized table flexibility, introduces an experimental, 
high-performance native S3 filesystem, and delivers enterprise-grade 
application management. Flink 2.3.0 brings together contributors from around 
the globe, implements 15 FLIPs (Flink Improvement Proposals), and resolves 
numerous issues and enhancements.
+
+Key improvements in this release include new SQL operators for changelog 
manipulation (`FROM_CHANGELOG` and `TO_CHANGELOG`), fine-grained control over 
materialized table refresh strategies, adaptive partition selection for 
optimized backpressure handling, and a completely redesigned S3 filesystem 
built on AWS SDK v2. The introduction of application-level lifecycle management 
provides better visibility and control for production deployments, while 
enhanced watermark alignment can dramatically improve backlog processing 
performance. A reworked SinkUpsertMaterializer brings much improved performance 
for some Flink SQL workloads. 
+
+We extend our heartfelt thanks to all contributors for making this release 
possible!
+
+Let's dive into the details.
+
+# Flink SQL Improvements
+
+## FROM_CHANGELOG and TO_CHANGELOG: Bridging Append-only and Dynamic Changelog 
Tables (FLIP-564)
+
+The DataStream API has long offered `toChangelogStream()` and 
`fromChangelogStream()` for working
+with changelog streams; Flink 2.3 brings equivalent functionality to SQL via 
two new built-in
+Process Table Functions:
+
+- `FROM_CHANGELOG` converts an append-only stream that carries an operation 
column into a dynamic
+  table. A configurable `op_mapping` makes it straightforward to plug in 
custom CDC formats and
+  controls how rows with unmapped operation codes are treated.
+- `TO_CHANGELOG` is the inverse: it materializes a dynamic table back into an 
append-only
+  changelog stream. This is the first SQL-level operator that lets users 
convert retract or
+  upsert streams into append form — useful for archival, audit, writing to 
append-only sinks,
+  and working around pipelines that require an append-only table.
+
+The 2.3 release covers limited basic use cases for both. Future versions will 
extend both
+functions with `PARTITION BY`, `invalid_op_handling`, `produces_full_deletes` 
and more to make
+both features powerful and extensive. 
+
+**More Information**
+* [FLIP-564](https://cwiki.apache.org/confluence/x/34k8G)
+
+## Materialized Table Evolution: DDL Parity and Refresh Control (FLIP-550 and 
FLIP-557)
+
+Flink 2.3 brings materialized tables to feature parity with regular tables 
through two major enhancements.
+
+First, `CREATE MATERIALIZED TABLE` now accepts explicit column definitions, 
including watermarks and primary keys, just like regular tables. `ALTER 
MATERIALIZED TABLE` gains full DDL capabilities—`ADD`, `MODIFY`, and `DROP` 
operations for metadata and computed columns, plus `RENAME TO`, allowing 
materialized tables to evolve through the same workflow already used for 
regular Flink tables.
+
+Second, Flink 2.3 introduces granular control over data reprocessing when a 
materialized table's query changes. The new `START_MODE` clause lets you choose 
exactly where the refresh pipeline begins. There is also special support for 
attempting to resume processing from the exact source offsets where the 
previous job instance stopped.
+
+These enhancements eliminate the need to drop and recreate materialized tables 
when query definitions evolve, and prevent unnecessary reprocessing of 
historical data when iterating on pipeline logic. While full reprocessing is 
sometimes unavoidable, specifically when the query optimizer generates a 
completely new physical plan, forcing this behavior for every evolution is both 
unnecessary and costly. For many common evolutions, such as adding a nullable 
column or making compatible logic changes, re-ingesting historical data can now 
be avoided.
+
+**More Information**
+* [FLIP-550](https://cwiki.apache.org/confluence/x/XwobFw)
+* [FLIP-557](https://cwiki.apache.org/confluence/x/9oPMFw)
+
+## SinkUpsertMaterializer: Explicit Conflict Handling (FLIP-558)
+
+The SinkUpsertMaterializer is required when the upsert key (the unique 
identifier provided by the stream) is different from the primary key (the 
unique identifier in the target sink table). This happens in scenarios like 
multi-stage transformations, projections, or joins.
+
+Previously, the SinkUpsertMaterializer has had poor performance and high 
resource consumption because it keeps the full history of updates for the 
primary key, leading to unbounded state growth. Flink 2.3 addresses this with 
two key improvements.
+
+By default, queries now fail at planning time when upsert and primary keys 
differ, requiring you to explicitly choose a conflict strategy. This is done 
with a new `ON CONFLICT` clause that makes the behavior explicit. You choose 
how to handle conflicts: `DO NOTHING` (silent skip), `DO ERROR` (fail the job), 
or `DO DEDUPLICATE` (materialize and deduplicate, similar to what Flink has 
done until now):
+
+```sql
+INSERT INTO target_table
+SELECT * FROM source
+ON CONFLICT DO DEDUPLICATE;
+```
+
+Second, watermark-based compaction reduces state size by cleaning up old 
changelog records that can no longer affect the final result. Two new 
configuration options control the compaction behavior:
+
+  - `table.exec.sink.upserts.compaction-mode` (default: `WATERMARK`) — 
`WATERMARK` or
+    `CHECKPOINT`.
+  - `table.exec.sink.upserts.compaction-interval` — optional fallback interval 
for emitting
+    watermarks when none arrive naturally.
+
+**More Information**
+* [FLIP-558](https://cwiki.apache.org/confluence/x/NoTMFw)
+
+## Process Table Function Enhancements (FLIP-565)
+
+Process Table Functions (PTFs), introduced in Flink 2.1, gain new capabilities 
that align them with the DataStream API:
+
+- **Late data handling**: PTFs can now react to late records instead of 
silently dropping them, enabling custom late data strategies at the SQL level.
+- **Ordered table arguments**: The new `ORDER BY` clause on table arguments 
ensures PTFs receive rows in deterministic temporal order within each partition:
+
+```sql
+SELECT * FROM 
+  MyTimestampedPtf(
+    input => TABLE events PARTITION BY user_id ORDER BY event_time
+  );
+```
+
+These enhancements enable more sophisticated temporal processing patterns 
directly in SQL.
+
+**More Information**
+* [FLIP-565](https://cwiki.apache.org/confluence/x/qIo8G)
+
+## ARTIFACT Keyword for User-Defined Functions (FLIP-559)
+
+The `USING` clause of `CREATE FUNCTION` now accepts an `ARTIFACT` keyword as a 
future-proof alternative to `JAR`. This generic keyword prepares the syntax for 
future ecosystem assets like Python wheels, while remaining fully backward 
compatible:
+
+```sql
+-- New syntax
+CREATE FUNCTION my_func AS 'com.example.MyUdf'
+  USING ARTIFACT 's3://bucket/path/my-udf.jar';
+
+-- Existing syntax still works
+CREATE FUNCTION my_func AS 'com.example.MyUdf'
+  USING JAR 's3://bucket/path/my-udf.jar';
+```
+
+**More Information**
+* [FLIP-559](https://cwiki.apache.org/confluence/x/64TMFw)
+
+## Critical Bug Fix: MiniBatch Aggregation Record Loss
+
+Flink 2.3 fixes a critical bug in `MiniBatchGroupAggFunction` that could 
silently drop records when mini-batch aggregation was enabled and the planner 
used a `ONE_PHASE` aggregation strategy. The bug occurred when a key's 
mini-batch contained only retractions with no existing state—the function would 
incorrectly return early, dropping all remaining keys in the bundle. This has 
been corrected to ensure all keys are processed.
+
+**More Information**
+* [FLINK-35661](https://issues.apache.org/jira/browse/FLINK-35661)
+
+# Connectors
+
+## Native S3 FileSystem
+
+Flink 2.3 introduces a ground-up rewrite of S3 connectivity with 
`flink-s3-fs-native`, a new plugin built directly on AWS SDK v2. This replaces 
the Hadoop and Presto-based connectors with a modern implementation that 
delivers:
+
+- **Native AWS integration**: IAM Roles for Service Accounts (IRSA), modern 
credential providers, and direct SDK v2 integration
+- **Non-blocking I/O**: Asynchronous operations for improved throughput
+- **Unified implementation**: Single plugin provides both `FileSystem` and 
`RecoverableWriter` (exactly-once streaming sinks)
+- **Zero Hadoop dependencies**: Cleaner deployment with smaller footprint
+
+The plugin registers the standard `s3://` scheme and uses a new `s3.*` 
configuration namespace:
+
+```yaml
+s3.region: us-west-2
+s3.endpoint: https://s3.us-west-2.amazonaws.com
+s3.path-style-access: false
+s3.upload.min.part.size: 5MB
+s3.upload.max.concurrent.uploads: 4
+s3.async.enabled: true
+s3.read.buffer.size: 64KB
+s3.bulk-copy.enabled: true
+```
+
+Additional configuration supports SSE-KMS encryption, chunked encoding, 
checksum validation, and entropy injection for high-throughput workloads.
+
+**More Information**
+* [FLIP-555](https://cwiki.apache.org/confluence/x/uYqmFw)
+* [Native S3 FileSystem 
Documentation](https://nightlies.apache.org/flink/flink-docs-release-2.3/docs/deployment/filesystems/s3/)
+
+# Runtime Improvements
+
+## Adaptive Partition Selection for RPC-Heavy Workloads (FLIP-339)
+
+When upstream and downstream parallelism differ, Flink uses 
`RebalancePartitioner`, which selects
+target channels round-robin. For jobs that interact with external RPC services 
(Redis, HBase,
+LLM serving, etc.) round-robin selection causes severe backpressure as soon as 
a single
+downstream subtask slows down — the partitioner keeps feeding it new data even 
though it is
+already overloaded.
+
+Flink 2.3 adds an adaptive, load-aware partition-selection mode for
+`StreamPartitioner` that routes records to the least-loaded downstream channel 
instead. In
+benchmarks, this delivers up to ~3x throughput improvement under skewed 
downstream processing.
+
+The feature is opt-in via two new options:
+
+- `taskmanager.network.adaptive-partitioner.enabled` (default: `false`)
+- `taskmanager.network.adaptive-partitioner.max-traverse-size` (default: `4`) 
— number of
+  channels examined when selecting the idlest target.
+
+**More Information**
+* [FLIP-339](https://cwiki.apache.org/confluence/x/nYyzDw)
+
+## AdaptiveScheduler Rescale History and Web UI (FLIP-487 and FLIP-495)
+
+Streaming jobs using the adaptive scheduler now record a complete history of 
rescale events, including parallelism changes, slot allocations, scheduler 
state transitions, and termination reasons. This data is available through new 
REST endpoints and visualized in a dedicated "Rescales" tab in the Flink Web UI.
+
+The UI exposes:
+- Rescale event counts and timeline
+- Duration statistics with percentiles
+- Per-event details (parallelism before/after, triggered-by reason)
+- Adaptive scheduler configuration
+
+```yaml
+# Enable rescale history (disabled by default)
+web.adaptive-scheduler.rescale-history.size: 100
+```
+
+Setting the history size to a positive integer activates the feature.
+
+**More Information**
+* [FLIP-487](https://cwiki.apache.org/confluence/x/vZCMEw)
+* [FLIP-495](https://cwiki.apache.org/confluence/x/TQr0Ew)
+* [Elastic Scaling 
Documentation](https://nightlies.apache.org/flink/flink-docs-release-2.3/docs/deployment/elastic_scaling/)
+
+## Watermark Alignment for Fast Backlog Processing (FLINK-37399)
+
+Prior to Flink 2.3, watermark alignment due to the announcement delays was 
inadvertently
+limiting how quickly a job could process a backlog. For example with max 
allowed drift
+configured to 30s and watermark alignment updated every ~1s, prior to Flink 
2.3 watermark
+alignment was de facto capping the backlog processing speed to:
+
+> 30 "event time" seconds per each 1 "real world" second
+
+In Flink 2.3 the watermark alignment was redesigned to solve those 
announcement delays by the
+introduction of a watermark alignment buffer. By default this buffer has a 
size of 3 and it
+delays the application of the watermark alignment algorithm by 3 update 
intervals. This means
+in Flink 2.3+, by default watermark alignment will be pausing sources a couple 
of seconds later
+than it used to, potentially increasing state size of windowed and temporal 
operators.
+However this should be negligible for all practical use cases. Nevertheless, 
the size of this
+buffer can be configured:
+
+```yaml
+# Control alignment buffer size (3 by default)
+pipeline.watermark-alignment.buffer-size: 3
+
+# Set to 0 to restore Flink 2.2 behavior
+pipeline.watermark-alignment.buffer-size: 0
+```
+
+**More Information**
+* [FLINK-37399](https://issues.apache.org/jira/browse/FLINK-37399)
+
+## Checkpointing During Recovery (FLIP-547)

Review Comment:
   Thanks for the ping, it is fine for me. 
   
   It is same with the release note from 
https://issues.apache.org/jira/browse/FLINK-35761



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Add Flink 2.3.0 release [flink-web]

Reply via email to