Re: [PR] Add Flink 2.2.0 Release [flink-web]

via GitHub Thu, 27 Nov 2025 00:47:53 -0800


RocMarshal commented on code in PR #817:
URL: https://github.com/apache/flink-web/pull/817#discussion_r2567636262



##########
docs/content/posts/2025-11-30-release-2.2.0.md:
##########
@@ -0,0 +1,312 @@
+---
+title: "Apache Flink 2.2.0: Advancing Real-Time Data + AI and Empowering 
Stream Processing for the AI Era"
+date: "2025-11-30T00:00:00.000Z"
+aliases:
+- /news/2025/11/30/release-2.2.0.html
+authors:
+- Hang:
+  name: "Hang Ruan"
+
+---
+
+The Apache Flink PMC is proud to announce the release of Apache Flink 2.2.0. 
Flink 2.2.0 further enriches AI capabilities, enhances materialized tables and 
the Connector framework, and improves batch processing and PyFlink support. 
This release brings together 73 global contributors, implements 9 FLIPs (Flink 
Improvement Proposals), and resolves over 220 issues. We extend our gratitude 
to all contributors for their invaluable support!
+
+Let's dive into the highlights.
+
+# Flink SQL Improvements
+
+## Realtime AI Function
+Apache Flink has supported leveraging LLM capabilities through the 
`ML_PREDICT` function in Flink SQL
+since version 2.1, enabling users to perform semantic analysis in a simple and 
efficient way. In Flink
+ 2.2, the Table API also supports model inference operations that allow you to 
integrate machine learning 
+models directly into your data processing pipelines. You can create models 
with specific providers (like
+ OpenAI) and use them to make predictions on your data.
+
+Example:
+- Creating and Using Models
+```java
+// 1. Set up the local environment
+EnvironmentSettings settings = EnvironmentSettings.inStreamingMode();
+TableEnvironment tEnv = TableEnvironment.create(settings);
+
+// 2. Create a source table from in-memory data
+Table myTable = tEnv.fromValues(
+    ROW(FIELD("text", STRING())),
+    row("Hello"),
+    row("Machine Learning"),
+    row("Good morning")
+);
+
+// 3. Create model
+tEnv.createModel(
+    "my_model",
+    ModelDescriptor.forProvider("openai")
+        .inputSchema(Schema.newBuilder().column("input", STRING()).build())
+        .outputSchema(Schema.newBuilder().column("output", STRING()).build())
+        .option("endpoint", "https://api.openai.com/v1/chat/completions";)
+        .option("model", "gpt-4.1")
+        .option("system-prompt", "translate to chinese")
+        .option("api-key", "<your-openai-api-key-here>")
+        .build()
+);
+
+Model model = tEnv.fromModel("my_model");
+
+// 4. Use the model to make predictions
+Table predictResult = model.predict(myTable, ColumnList.of("text"));
+
+// 5. Async prediction example
+Table asyncPredictResult = model.predict(
+    myTable, 
+    ColumnList.of("text"), 
+    Map.of("async", "true")
+);
+```
+
+**More Information**
+* [FLINK-38104](https://issues.apache.org/jira/browse/FLINK-38104)
+* 
[FLIP-526](https://cwiki.apache.org/confluence/display/FLINK/FLIP-526%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Table+API)
+
+## Vector Search
+
+Apache Flink has supported leveraging LLM capabilities through the 
`ML_PREDICT` function in Flink SQL
+since version 2.1. This integration has been technically validated in 
scenarios such as log classification 
+and real-time question-answering systems. However, the current architecture 
allows Flink to only use 
+embedding models to convert unstructured data (e.g., text, images) into 
high-dimensional vector features, 
+which are then persisted to downstream storage systems. It lacks real-time 
online querying and similarity 
+analysis capabilities for vector spaces. The VECTOR_SEARCH function is 
provided in Flink 2.2 to enable users 
+to perform streaming vector similarity searches and real-time context 
retrieval 
+directly within Flink.
+
+Take the following SQL statements as an example:
+```sql
+-- Basic usage
+SELECT * FROM 
+input_table, LATERAL TABLE(VECTOR_SEARCH(
+  TABLE vector_table,
+  input_table.vector_column,
+  DESCRIPTOR(index_column),
+  10
+));
+
+-- With configuration options
+SELECT * FROM 
+input_table, LATERAL TABLE(VECTOR_SEARCH(
+  TABLE vector_table,
+  input_table.vector_column,
+  DESCRIPTOR(index_column),
+  10,
+  MAP['async', 'true', 'timeout', '100s']
+));
+
+-- Using named parameters
+SELECT * FROM 
+input_table, LATERAL TABLE(VECTOR_SEARCH(
+  SEARCH_TABLE => TABLE vector_table,
+  COLUMN_TO_QUERY => input_table.vector_column,
+  COLUMN_TO_SEARCH => DESCRIPTOR(index_column),
+  TOP_K => 10,
+  CONFIG => MAP['async', 'true', 'timeout', '100s']
+));
+
+-- Searching with contant value
+SELECT * 
+FROM TABLE(VECTOR_SEARCH(
+  TABLE vector_table,
+  ARRAY[10, 20],
+  DESCRIPTOR(index_column),
+  10,
+));
+```
+
+**More Information**
+* [FLINK-38422](https://issues.apache.org/jira/browse/FLINK-38422)
+* 
[FLIP-540](https://cwiki.apache.org/confluence/display/FLINK/FLIP-540%3A+Support+VECTOR_SEARCH+in+Flink+SQL)
+* [Vector 
Search](https://nightlies.apache.org/flink/flink-docs-release-2.2/docs/dev/table/sql/queries/vector-search/)
+
+## Materialized Table
+
+Materialized Table is a new table type introduced in Flink SQL, aimed at 
simplifying both batch and
+stream data pipelines, providing a consistent development experience. By 
specifying data freshness
+and query when creating Materialized Table, the engine automatically derives 
the schema for the
+materialized table and creates corresponding data refresh pipeline to achieve 
the specified freshness.
+
+From Flink 2.2, the FRESHNESS clause is not a mandatory part of the CREATE 
MATERIALIZED TABLE and
+CREATE OR ALTER MATERIALIZED TABLE DDL statements. Flink 2.2 introduces a new 
MaterializedTableEnricher
+interface. This provides a formal extension point for customizable default 
logic, allowing advanced
+users and vendors to implement "smart" default behaviors (e.g., inferring 
freshness from upstream tables).
+
+Besides this, users can use `DISTRIBUTED INTO` or`DISTRIBUTED INTO` to support 
bucketing concept
+for Materialized tables. Users can use `SHOW MATERIALIZED TABLES` to show all 
Materialized tables.
+
+**More Information**
+* [FLINK-38532](https://issues.apache.org/jira/browse/FLINK-38532)
+* [FLINK-38311](https://issues.apache.org/jira/browse/FLINK-38311)
+* 
[FLIP-542](https://cwiki.apache.org/confluence/display/FLINK/FLIP-542%3A+Make+materialized+table+DDL+consistent+with+regular+tables)
+* 
[FLIP-551](https://cwiki.apache.org/confluence/display/FLINK/FLIP-551%3A+Make+FRESHNESS+Optional+for+Materialized+Tables)
+
+## SinkUpsertMaterializer V2
+
+SinkUpsertMaterializer is an operator in Flink that reconciles out of order 
changelog events before
+sending them to an upsert sink. Performance of this operator degrades 
exponentially in some cases.
+Flink 2.2 introduces a new implementation that is optimized for such cases.
+
+**More Information**
+* [FLINK-38459](https://issues.apache.org/jira/browse/FLINK-38459)
+* 
[FLIP-544](https://cwiki.apache.org/confluence/display/FLINK/FLIP-544%3A+SinkUpsertMaterializer+V2)
+
+## Delta Join
+
+In 2.1, Apache Flink has introduced a new delta join operator to mitigate the 
challenges caused by big state in regular joins. It replaces the large state 
maintained by regular joins with a bidirectional lookup-based join that 
directly reuses data from the source tables. 
+
+Flink 2.2 enhances support for converting more SQL patterns into delta joins. 
Delta joins now support consuming CDC sources without DELETE operations, and 
allow projection and filter operations after the source. Additionally, delta 
joins include support for caching, which helps reduce requests to external 
storage.
+
+**More Information**
+* [Delta 
Joins](https://nightlies.apache.org/flink/flink-docs-release-2.2/docs/dev/table/tuning/#delta-joins)
+
+# Runtime
+## Balanced Tasks Scheduling
+
+In Flink 2.1, we introduced a pluggable batching mechanism for async sink that 
allows users to define custom
+batching write strategies tailored to specific requirements.

Review Comment:
   This text doesn't seem suitable for describing the summary of changes in 
this subsection.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Add Flink 2.2.0 Release [flink-web]

Reply via email to