Re: [PR] Add Flink 2.1.0 release [flink-web]

via GitHub Sun, 13 Jul 2025 17:17:55 -0700


lihaosky commented on code in PR #800:
URL: https://github.com/apache/flink-web/pull/800#discussion_r2203642420



##########
docs/content/posts/2025-07-31-release-2.1.0.md:
##########
@@ -0,0 +1,438 @@
+---
+authors:
+  - reswqa:
+    name: "Ron Liu"
+    twitter: "Ron999"
+
+date: "2025-07-31T08:00:00Z"
+subtitle: ""
+title: Announcing the Release of Apache Flink 2.1
+aliases:
+  - /news/2025/07/31/release-2.1.0.html
+---
+
+The Apache Flink PMC is pleased to announce the release of Apache Flink 2.1.0. 
As usual, we are
+looking at a packed release with a wide variety of improvements and new 
features. Overall, 116
+people contributed to this release completing 15 FLIPs and 200+ issues. Thank 
you!
+
+Let's dive into the highlights.
+
+# Flink SQL Improvements
+
+## Realtime AI Function
+
+Since Flink 2.0, we have introduced dedicated syntax for AI models, enabling 
users to define models as easily as creating catalog objects 
+and invoke them like standard functions or table functions in SQL statements. 
+
+In Flink 2.1, we expanded the `ML_PREDICT` table-valued function (TVF) to 
perform realtime model inference in SQL queries, applying machine learning 
models to data streams seamlessly. 
+The implementation supports both embedded models (including OpenAI) and custom 
model providers, accelerating Flink's evolution from a real-time data 
processing engine to a unified realtime AI platform. 
+Looking ahead, we plan to introduce more AI functions such as `ML_EVALUATE`, 
`VECTOR_SEARCHOR` to unlock end-to-end experience for real-time data 
processing, model training, and inference.
+
+Take the following SQL statements as an example:
+```sql
+-- Declare a AI model
+CREATE MODEL `my_model`
+INPUT (f1 INT, f2 STRING)
+OUTPUT (label STRING, probs ARRAY<FLOAT>)
+WITH(
+  'task' = 'classification',
+  'type' = 'remote',
+  'provider' = 'openai',
+  'openai.endpoint' = 'https://api.openai.com/v1/llm/v1/chat',
+  'openai.api_key' = 'abcdefg'
+);
+
+-- Basic usage
+SELECT * FROM ML_PREDICT(
+  TABLE input_table,
+  MODEL my_model,
+  DESCRIPTOR(feature1, feature2)
+);
+
+-- With configuration options
+SELECT * FROM ML_PREDICT(
+  TABLE input_table,
+  MODEL my_model,
+  DESCRIPTOR(feature1, feature2),
+  MAP['async', 'true', 'timeout', '100s']
+);
+
+-- Using named parameters
+SELECT * FROM ML_PREDICT(
+  INPUT => TABLE input_table,
+  MODEL => MODEL my_model,
+  ARGS => DESCRIPTOR(feature1, feature2),
+  CONFIG => MAP['async', 'true']
+);
+```
+
+**More Information**
+* [FLINK-34992](https://issues.apache.org/jira/browse/FLINK-34992)
+* [FLINK-37777](https://issues.apache.org/jira/browse/FLINK-37777)
+* 
[FLIP-437](https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL)
+* 
[FLIP-525](https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design)
+* [Model 
Inference](https://nightlies.apache.org/flink/flink-docs-release-2.1/docs/dev/table/sql/queries/model-inference/)
+
+## Variant Type
+
+Variant is a new data type for semi-structured data(e.g. JSON), it supports 
storing any
+semi-structured data, including ARRAY, MAP(with STRING keys), and scalar 
types—while preserving
+field type information in a JSON-like structure. Unlike ROW and STRUCTURED 
types, VARIANT provides
+superior flexibility for handling deeply nested and evolving schemas.
+
+Users can use `PARSE_JSON` or`TRY_PARSE_JSON` to convert JSON-formatted 
VARCHAR data to VARIANT. In
+addition, table formats like Apache Paimon and Iceberg now support the VARIANT 
type, this enable
+users to efficiently process semi-structured data in lakehouse using Flink SQL.
+
+Take the following SQL statements as an example:
+```sql
+CREATE TABLE t1 (
+  id INTEGER,
+  v STRING -- a json string
+) WITH (
+  'connector' = 'mysql-cdc',
+  ...
+)
+ 
+CREATE TABLE t2 (
+  id INTEGER,
+  v VARIANT
+) WITH (
+  'connector' = 'paimon'
+  ...
+)
+ 
+-- write to t2 with VARIANT type
+INSERT INTO t2 SELECT id, PARSE_JSON(v) FROM t1;
+```
+
+**More Information**
+* [FLINK-37922](https://issues.apache.org/jira/browse/FLINK-37922)
+* 
[FLIP-521](https://cwiki.apache.org/confluence/display/FLINK/FLIP-521%3A+Integrating+Variant+Type+into+Flink%3A+Enabling+Efficient+Semi-Structured+Data+Processing)
+* 
[Variant](https://nightlies.apache.org/flink/flink-docs-release-2.1/docs/dev/table/types/#other-data-types)
+
+## Structured Type Enhancements
+
+In Flink 2.1, we enabled declare user-defined objects via STRUCTURED TYPE 
directly in `CREATE TABLE` DDL
+statements, resolving critical type equivalence issues and significantly 
improving API usability.
+
+Take the following SQL statements as an example:
+```sql
+CREATE TABLE MyTable (
+    uid BIGINT,
+    user STRUCTURED<'com.example.User', name STRING, age INT NOT NULL>
+);
+
+-- Casts a row type into a structured type
+INSERT INTO MyTable SELECT 1, CAST(('Bob', 42) AS 
STRUCTURED<'com.example.User', name STRING, age INT>);
+```
+
+**More Information**
+* [FLINK-37861](https://issues.apache.org/jira/browse/FLINK-37861)
+* 
[FLIP-520](https://cwiki.apache.org/confluence/display/FLINK/FLIP-520%3A+Simplify+StructuredType+handling)
+* 
[STRUCTURED](https://nightlies.apache.org/flink/flink-docs-release-2.1/docs/dev/table/types/#user-defined-data-types)
+
+## Delta Join
+
+Introduced a new DeltaJoin operator in stream processing jobs, along with 
optimizations for simple
+streaming join pipeline. Compared to traditional streaming join, delta join 
requires significantly
+less state, effectively mitigating issues related to large state, including 
resource bottlenecks,
+slow checkpointing, and lengthy job recovery times. This feature is enabled by 
default.
+
+**More Information**
+* [FLINK-37836](https://issues.apache.org/jira/browse/FLINK-37836)
+* [Delta 
Join](https://cwiki.apache.org/confluence/display/FLINK/FLIP-486%3A+Introduce+A+New+DeltaJoin)
+
+## Multi-way Join
+
+Streaming Flink jobs with multiple cascaded streaming joins often experience 
operational
+instability and performance degradation due to large state sizes. This release 
introduces a
+multi-way join operator (`StreamingMultiJoinOperator`) that drastically 
reduces state size
+by eliminating intermediate results. The operator achieves this by processing 
joins across all input
+streams simultaneously within a single operator instance, storing only raw 
input records instead of
+propagated join output.
+
+This "zero intermediate state" approach primarily targets state reduction, 
offering substantial
+benefits in resource consumption and operational stability. This feature is 
now available for
+pipelines with multiple INNER/LEFT joins that share at least one common join 
key, enable with
+`SET 'table.optimizer.multi-join.enabled' = 'true'`.
+
+**Benchmark**
+
+Here's a 10-way benchmark between the default streaming join and the multi-way 
join optimization, 
+the star schema join pattern as following:
+```sql
+-- Enable multi-way join optimization
+SET 'table.optimizer.multi-join.enabled' = 'true';
+
+-- star schema join pattern
+INSERT INTO JoinResultsMJ SELECT * FROM TenantKafka t
+LEFT JOIN SuppliersKafka s ON t.tenant_id = s.tenant_id AND ...
+LEFT JOIN ProductsKafka p ON t.tenant_id = p.tenant_id AND ...
+LEFT JOIN CategoriesKafka c ON t.tenant_id = c.tenant_id AND ...
+LEFT JOIN OrdersKafka o ON t.tenant_id = o.tenant_id AND ...
+LEFT JOIN CustomersKafka cust ON t.tenant_id = cust.tenant_id AND ...
+LEFT JOIN WarehousesKafka w ON t.tenant_id = w.tenant_id AND ...
+LEFT JOIN ShippingKafka sh ON t.tenant_id = sh.tenant_id AND ...
+LEFT JOIN PaymentKafka pay ON t.tenant_id = pay.tenant_id AND ...
+LEFT JOIN InventoryKafka i ON t.tenant_id = i.tenant_id AND ...;
+```
+
+You can observe the amount of intermediate state in the first section, the 
amount of records processed when the operators reach 100% busyness in the 
second section, and the checkpoints in the third.
+
+<div style="text-align: center;">
+<img src="/img/blog/2025-07-31-release-2.1.0/multi_way_join.png" 
style="width:70%;margin:15px">
+</div>
+
+For this 10-way join above, involving record amplification, the benchmark 
benefits as follows:
+
+- Performance: 2x to over 100x+ increase in processed records when both at 
100% busyness.
+- State Size: 3x to over 1000x+ smaller as intermediate state grows.
+
+**More Information**
+* [FLINK-37859](https://issues.apache.org/jira/browse/FLINK-37859)
+* [Multi-way 
Join](https://cwiki.apache.org/confluence/display/FLINK/FLIP-516%3A+Multi-Way+Join+Operator)
+
+## Async Lookup Join Enhancements
+
+Support handling records in order based on upsert key (the unique key in the 
input stream deduced by
+planner) while allowing parallel processing of different keys to achieve 
better throughput when
+processing changelog data stream.
+
+**More Information**
+* [FLINK-37874](https://issues.apache.org/jira/browse/FLINK-37874)
+* [Async Lookup 
Join](https://cwiki.apache.org/confluence/display/FLINK/FLIP-519%3A++Introduce+async+lookup+key+ordered+mode)
+
+## Sink Reuse
+
+Within a single Flink job, when write multiple `INSERT INTO` statements update 
identical or
+different columns of a target table, the planner will optimize the execution 
plan and merge the sink
+nodes to achieve reuse. This would be a great usability improvement for users 
using partial-update
+features with data lake storages like Apache Paimon.
+
+**More Information**
+* [FLINK-37227](https://issues.apache.org/jira/browse/FLINK-37227)
+* [Sink 
Reuse](https://cwiki.apache.org/confluence/display/FLINK/FLIP-506%3A+Support+Reuse+Multiple+Table+Sinks+in+Planner)
+
+## Support Smile Format for Compiled Plan Serialization
+
+In Flink 2.1, we added smile binary format support for compiled plans, 
providing a memory-efficient
+alternative to JSON for serialization/deserialization. By default JSON is 
used, in order to use
+smile format need to call `CompiledPlan#asSmileBytes` and 
`PlanReference#fromSmileBytes` method.
+
+**More Information**
+* [FLINK-37341](https://issues.apache.org/jira/browse/FLINK-37341)
+* [Simle 
Format](https://cwiki.apache.org/confluence/display/FLINK/FLIP-508%3A+Add+support+for+Smile+format+for+Compiled+plans)
+
+# Runtime
+
+## Add Pluggable Batching for Async Sink
+
+In Flink 2.1, we introduced a pluggable batching mechanism for async sink that 
allows users to define custom
+batching write strategies tailored to specific requirements.
+
+**More Information**
+* [FLINK-37298](https://issues.apache.org/jira/browse/FLINK-37298)
+* [Pluggable Batching for Async 
Sink](https://cwiki.apache.org/confluence/display/FLINK/FLIP-509+Add+pluggable+Batching+for+Async+Sink)
+
+## Split-level Watermark Metrics
+
+In Flink 2.1, we added some split level watermark metrics, covering watermark 
progress and per-split state gauges
+to enhance the watermark observability:
+
+- `currentWatermark`: the last watermark this split has received.
+- `activeTimeMsPerSecond`: the time this split is active per second.
+- `pausedTimeMsPerSecond`: the time this split is paused due to watermark 
alignment per second.
+- `idleTimeMsPerSecond`: the time this split is marked idle by idleness 
detection per second.
+- `accumulatedActiveTimeMs`: accumulated time this split was active since 
registered.
+- `accumulatedPausedTimeMs`: accumulated time this split was paused since 
registered.
+- `accumulatedIdleTimeMs`: accumulated time this split was idle since 
registered.
+
+**More Information**
+* [FLINK-37410](https://issues.apache.org/jira/browse/FLINK-37410)
+* [Watermark 
Metrics](https://cwiki.apache.org/confluence/display/FLINK/FLIP-513%3A+Split-level+Watermark+Metrics)
+
+# Connectors
+
+## Introduce SQL Connector for Keyed State
+
+In Flink 2.1, we introduced a new connector for keyed state. This connector 
allows
+users to query keyed state directly from checkpoint or savepoint using Flink 
SQL, making it easier
+to inspect, debug, and validate the state of Flink jobs without custom 
tooling. This feature is
+especially useful for analyzing long-running jobs and validating state 
migrations.
+
+With a simple DDL, you can expose ValueState as table and run Flink SQL query 
the snapshot:
+```sql
+CREATE TABLE keyed_state (
+    k INTEGER,
+    user_id STRING,
+    balance DOUBLE
+) WITH (
+    'connector' = 'savepoint',
+    'path' = 'file:///savepoint/path&',
+    'uid' = 'my-operator-id'
+);
+
+-- Query the keyed state
+SELECT * FROM keyed_state;
+```
+
+**More Information**
+* [FLINK-36929](https://issues.apache.org/jira/browse/FLINK-36929)
+* [Savepoint 
Connector](https://cwiki.apache.org/confluence/display/FLINK/FLIP-496%3A+SQL+connector+for+keyed+savepoint+data)
+
+# Others Improvements
+
+## PyFlink
+In PyFlink 2.1, we added support of Python 3.12 and removes support of Python 
3.8
+
+**More Information**
+* [FLINK-37823](https://issues.apache.org/jira/browse/FLINK-37823)
+* [FLINK-37776](https://issues.apache.org/jira/browse/FLINK-37776)
+
+
+## Upgrade flink-shaded version to 20.0
+
+Bump flink-shaded version to 20.0 to Smile format.
+
+**More Information**
+* [FLINK-37376](https://issues.apache.org/jira/browse/FLINK-37376)
+
+## Upgrade Parquet version to 1.15.3
+
+Bump parquet version to 1.15.3 to resolve parquet-avro module
+vulnerability found in 
[CVE-2025-30065](https://nvd.nist.gov/vuln/detail/CVE-2025-30065).
+
+**More Information**
+* [FLINK-37760](https://issues.apache.org/jira/browse/FLINK-37760)
+
+# Upgrade Notes
+
+The Flink community tries to ensure that upgrades are as seamless as possible.
+However, certain changes may require users to make adjustments to certain parts
+of the program when upgrading to version 2.1. Please refer to the
+[release 
notes](https://nightlies.apache.org/flink/flink-docs-release-2.1/release-notes/flink-2.1/)
+for a comprehensive list of adjustments to make and issues to check during the
+upgrading process.
+
+# List of Contributors

Review Comment:
   Looks it's `yanand0909` below. Not sure why some is id and some is name 
though :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] Add Flink 2.1.0 release [flink-web]

Reply via email to