Re: [PR] Add Flink 2.1.0 release [flink-web]

via GitHub Fri, 25 Jul 2025 07:49:03 -0700


twalthr commented on code in PR #800:
URL: https://github.com/apache/flink-web/pull/800#discussion_r2231215821



##########
docs/content/posts/2025-07-31-release-2.1.0.md:
##########
@@ -0,0 +1,475 @@
+---
+authors:
+  - reswqa:
+    name: "Ron Liu"
+    twitter: "Ron999"
+
+date: "2025-07-31T08:00:00Z"
+subtitle: ""
+title: Ushers in a New Era of Unified Real-Time Data + AI with Comprehensive 
Upgrades
+aliases:
+  - /news/2025/07/31/release-2.1.0.html
+---
+
+The Apache Flink PMC is proud to announce the release of Apache Flink 2.1.0. 
This marks a significant milestone 
+in the evolution of the real-time data processing engine into a unified Data + 
AI platform. This release brings 
+together 116 global contributors, implements 15 FLIPs (Flink Improvement 
Proposals), and resolves over 220 issues, 
+with a strong focus on deepening the integration of real-time AI and 
intelligent stream processing:
+
+1. **Breakthroughs in Real-Time AI**:
+   - Introduces AI Model DDL, enabling flexible management of AI models 
through Flink SQL and the Table API.
+
+   - Extends the `ML_PREDICT` Table-Valued Function (TVF), empowering 
real-time invocation of AI models within Flink SQL, 
+     laying the foundation for building end-to-end real-time AI workflows.
+
+2. **Enhanced Real-Time Data Processing**:
+   - Adds the `VARIANT` data type for efficient handling of semi-structured 
data like JSON. Combined with the `PARSE_JSON` function 
+     and lakehouse formats (e.g., Apache Paimon), it enables dynamic schema 
data analysis.
+
+   - Significantly optimizes streaming joins with the innovative introduction 
of `DeltaJoin` and `MultiJoin` strategies, 

Review Comment:
   We should also mention `ProcessTableFunction` here. For Flink SQL and Table 
API this is equally important or even has higher impact then e.g. VARIANT.



##########
docs/content/posts/2025-07-31-release-2.1.0.md:
##########
@@ -0,0 +1,475 @@
+---
+authors:
+  - reswqa:
+    name: "Ron Liu"
+    twitter: "Ron999"
+
+date: "2025-07-31T08:00:00Z"
+subtitle: ""
+title: Ushers in a New Era of Unified Real-Time Data + AI with Comprehensive 
Upgrades
+aliases:
+  - /news/2025/07/31/release-2.1.0.html
+---
+
+The Apache Flink PMC is proud to announce the release of Apache Flink 2.1.0. 
This marks a significant milestone 
+in the evolution of the real-time data processing engine into a unified Data + 
AI platform. This release brings 
+together 116 global contributors, implements 15 FLIPs (Flink Improvement 
Proposals), and resolves over 220 issues, 
+with a strong focus on deepening the integration of real-time AI and 
intelligent stream processing:
+
+1. **Breakthroughs in Real-Time AI**:
+   - Introduces AI Model DDL, enabling flexible management of AI models 
through Flink SQL and the Table API.
+
+   - Extends the `ML_PREDICT` Table-Valued Function (TVF), empowering 
real-time invocation of AI models within Flink SQL, 
+     laying the foundation for building end-to-end real-time AI workflows.
+
+2. **Enhanced Real-Time Data Processing**:
+   - Adds the `VARIANT` data type for efficient handling of semi-structured 
data like JSON. Combined with the `PARSE_JSON` function 
+     and lakehouse formats (e.g., Apache Paimon), it enables dynamic schema 
data analysis.
+
+   - Significantly optimizes streaming joins with the innovative introduction 
of `DeltaJoin` and `MultiJoin` strategies, 
+     eliminating state bottlenecks and improving resource utilization and job 
stability.
+
+Flink 2.1.0 seamlessly integrates real-time data processing with AI models, 
empowering enterprises to advance from 
+real-time analytics to real-time intelligent decision-making, meeting the 
evolving demands of modern data applications. 
+We extend our gratitude to all contributors for their invaluable support!
+
+Let's dive into the highlights.
+
+# Flink SQL Improvements
+
+## Model DDLs
+Since Flink 2.0, we have introduced dedicated syntax for AI models, enabling 
users to define models 
+as easily as creating catalog objects and invoke them like standard functions 
or table functions in SQL statements. 
+In Flink 2.1, we have also added Model DDLs Table API support, enabling users 
to define and manage AI models programmatically 
+via the Table API in both Java and Python. This provides a flexible, 
code-driven alternative to SQL for model management and
+integration within Flink applications.
+
+Example: 
+- Defining a Model via Flink SQL
+```sql
+CREATE MODEL my_model
+INPUT (f0 STRING)
+OUTPUT (label STRING)
+WITH (
+  'task' = 'classification',
+  'type' = 'remote',
+  'provider' = 'openai',
+  'openai.endpoint' = 'remote',
+  'openai.api_key' = 'abcdefg',
+);
+```
+
+- Defining a Model via Table API (Java)
+```java
+tEnv.createModel(
+    "MyModel", 
+    ModelDescriptor.forProvider("OPENAI")
+      .inputSchema(Schema.newBuilder()
+        .column("f0", DataTypes.STRING())
+        .build())
+      .outputSchema(Schema.newBuilder()
+        .column("label", DataTypes.STRING())
+        .build())
+      .option("task", "classification")
+      .option("type", "remote")
+      .option("provider", "openai")
+      .option("openai.endpoint", "remote")
+      .option("openai.api_key", "abcdefg")
+      .build(),
+    true);
+```
+
+**More Information**
+* [FLINK-37548](https://issues.apache.org/jira/browse/FLINK-37548)
+* 
[FLIP-437](https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL)
+* 
[FLIP-507](https://cwiki.apache.org/confluence/display/FLINK/FLIP-507%3A+Add+Model+DDL+methods+in+TABLE+API)
+
+## Realtime AI Function
+
+Based on the AI model DDL, In Flink 2.1, we expanded the `ML_PREDICT` 
table-valued function (TVF) to perform realtime model inference in SQL queries, 
applying machine learning models to data streams seamlessly.
+The implementation supports both Flink builtin model providers (OpenAI) and 
interfaces for users to define custom model providers, accelerating Flink's 
evolution from a real-time 
+data processing engine to a unified realtime AI platform. Looking ahead, we 
plan to introduce more AI functions such as `ML_EVALUATE`, `VECTOR_SEARCH` to 
unlock end-to-end experience 
+for real-time data processing, model training, and inference.
+
+Take the following SQL statements as an example:
+```sql
+-- Declare a AI model
+CREATE MODEL `my_model`
+INPUT (text STRING)
+OUTPUT (response STRING)
+WITH(
+  'provider' = 'openai',
+  'endpoint' = 'https://api.openai.com/v1/llm/v1/chat',
+  'api-key' = 'abcdefg',
+  'system-prompt' = 'translate to Chinese',
+  'model' = 'gpt-4o'
+);
+
+-- Basic usage
+SELECT * FROM ML_PREDICT(
+  TABLE input_table,
+  MODEL my_model,
+  DESCRIPTOR(text)
+);
+
+-- With configuration options
+SELECT * FROM ML_PREDICT(
+  TABLE input_table,
+  MODEL my_model,
+  DESCRIPTOR(text)
+  MAP['async', 'true', 'timeout', '100s']
+);
+
+-- Using named parameters
+SELECT * FROM ML_PREDICT(
+  INPUT => TABLE input_table,
+  MODEL => MODEL my_model,
+  ARGS => DESCRIPTOR(text),
+  CONFIG => MAP['async', 'true']
+);
+```
+
+**More Information**
+* [FLINK-34992](https://issues.apache.org/jira/browse/FLINK-34992)
+* [FLINK-37777](https://issues.apache.org/jira/browse/FLINK-37777)
+* 
[FLIP-437](https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL)
+* 
[FLIP-525](https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design)
+* [Model 
Inference](https://nightlies.apache.org/flink/flink-docs-release-2.1/docs/dev/table/sql/queries/model-inference/)
+
+## Variant Type

Review Comment:
   Please add a section for ProcessTableFunction:
   
   ```
   Apache Flink now includes support for Process Table Functions (PTFs), the 
most powerful function kind for Flink SQL and Table API.
   
   Conceptually, a PTF is a superset of all other user-defined functions, 
mapping zero, one, or multiple tables to zero, one, or multiple rows. They 
enable implementing user-defined operators that can be as feature-rich as 
built-in operations. PTFs have access to Flink's managed state, event-time, 
timer services, and table changelogs.
   
   PTFs enable the following tasks:
   - Apply transformations on each row of a table.
   - Logically partition the table into distinct sets and apply transformations 
per set.
   - Store seen events for repeated access.
   - Continue the processing at a later point in time enabling waiting, 
synchronization, or timeouts.
   - Buffer and aggregate events using complex state machines or rule-based 
conditional logic.
   
   This moves Flink SQL significantly closer to the DataStream API, leveraging 
the robustness and familiarity of the existing SQL ecosystem.
   
   Detailed information on PTF syntax and semantics can be found here: 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/functions/ptfs/
   
   Take the following code as an example:
   
   // Declare a ProcessTableFunction for memorizing your customers
   public static class GreetingWithMemory extends ProcessTableFunction<String> {
     public static class CountState {
       public long counter = 0L;
     }
   
     public void eval(@StateHint CountState state, 
@ArgumentHint(SET_SEMANTIC_TABLE) Row input) {
       state.counter++;
       collect("Hello " + input.getFieldAs("name") + ", your " + state.counter 
+ " time?");
     }
   }
   
   TableEnvironment env = TableEnvironment.create(...);
   
   // Call the PTF in Table API
   env.fromValues("Bob", "Alice", "Bob")
     .as("name")
     .partitionBy($("name"))
     .process(GreetingWithMemory.class)
     .execute()
     .print();
   
   // Call the PTF in SQL
   env.executeSql("SELECT * FROM GreetingWithMemory(TABLE Names PARTITION BY 
name)").print();
   
   
   **More Information**
   - FLIP-440: 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=298781093
   ```



##########
docs/content/posts/2025-07-31-release-2.1.0.md:
##########
@@ -0,0 +1,475 @@
+---
+authors:
+  - reswqa:
+    name: "Ron Liu"
+    twitter: "Ron999"
+
+date: "2025-07-31T08:00:00Z"
+subtitle: ""
+title: Ushers in a New Era of Unified Real-Time Data + AI with Comprehensive 
Upgrades
+aliases:
+  - /news/2025/07/31/release-2.1.0.html
+---
+
+The Apache Flink PMC is proud to announce the release of Apache Flink 2.1.0. 
This marks a significant milestone 
+in the evolution of the real-time data processing engine into a unified Data + 
AI platform. This release brings 
+together 116 global contributors, implements 15 FLIPs (Flink Improvement 
Proposals), and resolves over 220 issues, 
+with a strong focus on deepening the integration of real-time AI and 
intelligent stream processing:
+
+1. **Breakthroughs in Real-Time AI**:
+   - Introduces AI Model DDL, enabling flexible management of AI models 
through Flink SQL and the Table API.
+
+   - Extends the `ML_PREDICT` Table-Valued Function (TVF), empowering 
real-time invocation of AI models within Flink SQL, 
+     laying the foundation for building end-to-end real-time AI workflows.
+
+2. **Enhanced Real-Time Data Processing**:
+   - Adds the `VARIANT` data type for efficient handling of semi-structured 
data like JSON. Combined with the `PARSE_JSON` function 
+     and lakehouse formats (e.g., Apache Paimon), it enables dynamic schema 
data analysis.
+
+   - Significantly optimizes streaming joins with the innovative introduction 
of `DeltaJoin` and `MultiJoin` strategies, 

Review Comment:
   ```
   - Process Table Functions (PTFs) open up the Flink SQL engine for more 
event-driven application. Giving access to Flink’s managed state, event-time 
and timer services, and underlying table changelogs.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Add Flink 2.1.0 release [flink-web]

Reply via email to