RocMarshal commented on code in PR #817:
URL: https://github.com/apache/flink-web/pull/817#discussion_r2567636262
##########
docs/content/posts/2025-11-30-release-2.2.0.md:
##########
@@ -0,0 +1,312 @@
+---
+title: "Apache Flink 2.2.0: Advancing Real-Time Data + AI and Empowering
Stream Processing for the AI Era"
+date: "2025-11-30T00:00:00.000Z"
+aliases:
+- /news/2025/11/30/release-2.2.0.html
+authors:
+- Hang:
+ name: "Hang Ruan"
+
+---
+
+The Apache Flink PMC is proud to announce the release of Apache Flink 2.2.0.
Flink 2.2.0 further enriches AI capabilities, enhances materialized tables and
the Connector framework, and improves batch processing and PyFlink support.
This release brings together 73 global contributors, implements 9 FLIPs (Flink
Improvement Proposals), and resolves over 220 issues. We extend our gratitude
to all contributors for their invaluable support!
+
+Let's dive into the highlights.
+
+# Flink SQL Improvements
+
+## Realtime AI Function
+Apache Flink has supported leveraging LLM capabilities through the
`ML_PREDICT` function in Flink SQL
+since version 2.1, enabling users to perform semantic analysis in a simple and
efficient way. In Flink
+ 2.2, the Table API also supports model inference operations that allow you to
integrate machine learning
+models directly into your data processing pipelines. You can create models
with specific providers (like
+ OpenAI) and use them to make predictions on your data.
+
+Example:
+- Creating and Using Models
+```java
+// 1. Set up the local environment
+EnvironmentSettings settings = EnvironmentSettings.inStreamingMode();
+TableEnvironment tEnv = TableEnvironment.create(settings);
+
+// 2. Create a source table from in-memory data
+Table myTable = tEnv.fromValues(
+ ROW(FIELD("text", STRING())),
+ row("Hello"),
+ row("Machine Learning"),
+ row("Good morning")
+);
+
+// 3. Create model
+tEnv.createModel(
+ "my_model",
+ ModelDescriptor.forProvider("openai")
+ .inputSchema(Schema.newBuilder().column("input", STRING()).build())
+ .outputSchema(Schema.newBuilder().column("output", STRING()).build())
+ .option("endpoint", "https://api.openai.com/v1/chat/completions")
+ .option("model", "gpt-4.1")
+ .option("system-prompt", "translate to chinese")
+ .option("api-key", "<your-openai-api-key-here>")
+ .build()
+);
+
+Model model = tEnv.fromModel("my_model");
+
+// 4. Use the model to make predictions
+Table predictResult = model.predict(myTable, ColumnList.of("text"));
+
+// 5. Async prediction example
+Table asyncPredictResult = model.predict(
+ myTable,
+ ColumnList.of("text"),
+ Map.of("async", "true")
+);
+```
+
+**More Information**
+* [FLINK-38104](https://issues.apache.org/jira/browse/FLINK-38104)
+*
[FLIP-526](https://cwiki.apache.org/confluence/display/FLINK/FLIP-526%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Table+API)
+
+## Vector Search
+
+Apache Flink has supported leveraging LLM capabilities through the
`ML_PREDICT` function in Flink SQL
+since version 2.1. This integration has been technically validated in
scenarios such as log classification
+and real-time question-answering systems. However, the current architecture
allows Flink to only use
+embedding models to convert unstructured data (e.g., text, images) into
high-dimensional vector features,
+which are then persisted to downstream storage systems. It lacks real-time
online querying and similarity
+analysis capabilities for vector spaces. The VECTOR_SEARCH function is
provided in Flink 2.2 to enable users
+to perform streaming vector similarity searches and real-time context
retrieval
+directly within Flink.
+
+Take the following SQL statements as an example:
+```sql
+-- Basic usage
+SELECT * FROM
+input_table, LATERAL TABLE(VECTOR_SEARCH(
+ TABLE vector_table,
+ input_table.vector_column,
+ DESCRIPTOR(index_column),
+ 10
+));
+
+-- With configuration options
+SELECT * FROM
+input_table, LATERAL TABLE(VECTOR_SEARCH(
+ TABLE vector_table,
+ input_table.vector_column,
+ DESCRIPTOR(index_column),
+ 10,
+ MAP['async', 'true', 'timeout', '100s']
+));
+
+-- Using named parameters
+SELECT * FROM
+input_table, LATERAL TABLE(VECTOR_SEARCH(
+ SEARCH_TABLE => TABLE vector_table,
+ COLUMN_TO_QUERY => input_table.vector_column,
+ COLUMN_TO_SEARCH => DESCRIPTOR(index_column),
+ TOP_K => 10,
+ CONFIG => MAP['async', 'true', 'timeout', '100s']
+));
+
+-- Searching with contant value
+SELECT *
+FROM TABLE(VECTOR_SEARCH(
+ TABLE vector_table,
+ ARRAY[10, 20],
+ DESCRIPTOR(index_column),
+ 10,
+));
+```
+
+**More Information**
+* [FLINK-38422](https://issues.apache.org/jira/browse/FLINK-38422)
+*
[FLIP-540](https://cwiki.apache.org/confluence/display/FLINK/FLIP-540%3A+Support+VECTOR_SEARCH+in+Flink+SQL)
+* [Vector
Search](https://nightlies.apache.org/flink/flink-docs-release-2.2/docs/dev/table/sql/queries/vector-search/)
+
+## Materialized Table
+
+Materialized Table is a new table type introduced in Flink SQL, aimed at
simplifying both batch and
+stream data pipelines, providing a consistent development experience. By
specifying data freshness
+and query when creating Materialized Table, the engine automatically derives
the schema for the
+materialized table and creates corresponding data refresh pipeline to achieve
the specified freshness.
+
+From Flink 2.2, the FRESHNESS clause is not a mandatory part of the CREATE
MATERIALIZED TABLE and
+CREATE OR ALTER MATERIALIZED TABLE DDL statements. Flink 2.2 introduces a new
MaterializedTableEnricher
+interface. This provides a formal extension point for customizable default
logic, allowing advanced
+users and vendors to implement "smart" default behaviors (e.g., inferring
freshness from upstream tables).
+
+Besides this, users can use `DISTRIBUTED INTO` or`DISTRIBUTED INTO` to support
bucketing concept
+for Materialized tables. Users can use `SHOW MATERIALIZED TABLES` to show all
Materialized tables.
+
+**More Information**
+* [FLINK-38532](https://issues.apache.org/jira/browse/FLINK-38532)
+* [FLINK-38311](https://issues.apache.org/jira/browse/FLINK-38311)
+*
[FLIP-542](https://cwiki.apache.org/confluence/display/FLINK/FLIP-542%3A+Make+materialized+table+DDL+consistent+with+regular+tables)
+*
[FLIP-551](https://cwiki.apache.org/confluence/display/FLINK/FLIP-551%3A+Make+FRESHNESS+Optional+for+Materialized+Tables)
+
+## SinkUpsertMaterializer V2
+
+SinkUpsertMaterializer is an operator in Flink that reconciles out of order
changelog events before
+sending them to an upsert sink. Performance of this operator degrades
exponentially in some cases.
+Flink 2.2 introduces a new implementation that is optimized for such cases.
+
+**More Information**
+* [FLINK-38459](https://issues.apache.org/jira/browse/FLINK-38459)
+*
[FLIP-544](https://cwiki.apache.org/confluence/display/FLINK/FLIP-544%3A+SinkUpsertMaterializer+V2)
+
+## Delta Join
+
+In 2.1, Apache Flink has introduced a new delta join operator to mitigate the
challenges caused by big state in regular joins. It replaces the large state
maintained by regular joins with a bidirectional lookup-based join that
directly reuses data from the source tables.
+
+Flink 2.2 enhances support for converting more SQL patterns into delta joins.
Delta joins now support consuming CDC sources without DELETE operations, and
allow projection and filter operations after the source. Additionally, delta
joins include support for caching, which helps reduce requests to external
storage.
+
+**More Information**
+* [Delta
Joins](https://nightlies.apache.org/flink/flink-docs-release-2.2/docs/dev/table/tuning/#delta-joins)
+
+# Runtime
+## Balanced Tasks Scheduling
+
+In Flink 2.1, we introduced a pluggable batching mechanism for async sink that
allows users to define custom
+batching write strategies tailored to specific requirements.
Review Comment:
This text doesn't seem suitable for describing the summary of changes in
this subsection.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]