Re: [PR] [FLINK-35847] Add 1.20 release announcement [flink-web]

via GitHub Fri, 19 Jul 2024 02:15:30 -0700


xintongsong commented on code in PR #751:
URL: https://github.com/apache/flink-web/pull/751#discussion_r1684056042



##########
docs/data/flink.yml:
##########
@@ -15,6 +15,16 @@
 # specific language governing permissions and limitations
 # under the License
 
+1.20:

Review Comment:
   I'd suggest to remove both 1.16 & 1.17.



##########
docs/content/posts/2024-07-15-release-1.20.0.md:
##########
@@ -0,0 +1,560 @@
+---
+authors:
+- reswqa:
+  name: "Weijie Guo"
+  twitter: "WeijieGuo12"
+- 1996fanrui:
+  name: "Rui Fan"
+  twitter: "1996fanrui"
+
+date: "2024-07-15T08:00:00Z"
+subtitle: ""
+title: Announcing the Release of Apache Flink 1.20
+aliases:
+- /news/2024/07/15/release-1.20.0.html
+---
+
+The Apache Flink PMC is pleased to announce the release of Apache Flink 
1.20.0. As usual, we are
+looking at a packed release with a wide variety of improvements and new 
features. Overall, 142
+people contributed to this release completing 13 FLIPs and 300+ issues. Thank 
you!
+
+Let's dive into the highlights.
+
+# Standing on the Eve of Apache Flink 2.0
+
+Flink 1.0 was released seven years ago. Since several months, the community is 
actively planning and taking steps towards
+the next major release. The new 1.20 release is planned to be the last minor 
release before Flink 2.0, which is anticipated by the end of 2024.
+
+Start from Flink 1.19, the community has decided to officially deprecate 
multiple APIs that were approaching 
+end of life for a while. In 1.20, we further sorted through all relevant APIs 
that might need to be replaced
+or deprecated to clear the way for the 2.0 release:
+- Configuration Improvements: As Flink moves towards version 2.0, we have 
revisited all runtime & Table API/SQL 
+configuration options and identified several opportunities to enhance 
user-friendliness and maintainability.
+- Deprecate the Legacy 
[SinkFunction](https://nightlies.apache.org/flink/flink-docs-release-1.20/api/java/org/apache/flink/streaming/api/functions/sink/SinkFunction.html)
 API: Since its introduction in Flink 1.12, the [Unified Sink 
API](https://nightlies.apache.org/flink/flink-docs-release-1.20/api/java/org/apache/flink/api/connector/sink2/Sink.html)
+has undergone extensive development and testing. Over multiple release cycles, 
the API has demonstrated
+stability and robustness, aligning with the criteria set forth in 
[FLIP-197](https://cwiki.apache.org/confluence/display/FLINK/FLIP-197%3A+API+stability+graduation+process)
 for API stability graduation.
+Therefore, we promote the Unified Sink API v2 to `@Public` and deprecate the 
legacy `SinkFunction` interface.
+
+It has been seven years since the Flink community's last major release, and we 
have great expectations for Flink 2.0.
+We are planning to release several high-impact features in `2.x`. Some of them 
are already introduced in Flink 1.20 in MVP (minimum viable product) state and 
discussed in more detail below.
+- Introduce a New Materialized Table for Simplifying Data Pipelines: 
[FLIP-435](https://cwiki.apache.org/confluence/display/FLINK/FLIP-435%3A+Introduce+a+New+Materialized+Table+for+Simplifying+Data+Pipelines)
 designed to simplify the development of
+data processing pipelines. With dynamic table with uniform SQL statements and 
freshness, users can define batch
+and streaming transformations to data in the same way, accelerate ETL pipeline 
development, and manage task scheduling
+automatically. See below for more details on this exciting feature.
+- Unified File Merging Mechanism for Checkpoints: The unified file merging 
mechanism for checkpointing is introduced to
+Flink 1.20 as an MVP feature, which allows scattered small checkpoint files to 
be written into larger files, reducing
+the number of file creations and file deletions and alleviating the pressure 
of file system metadata management raised by
+the file flooding problem during checkpoints.
+
+# Flink SQL Improvements
+
+## Introduce Materialized Tables
+
+We introduced Materialized Tables abstraction in Flink SQL, a new table type 
designed to simplify both batch and stream
+data pipelines while providing a consistent development experience.
+
+Materialized tables are defined with a query and a data freshness 
specification. The engine automatically derives the table
+schema and creates a data refresh pipeline to maintain the query result with 
the requested freshness. Users are relieved from
+the burden of comprehending the concepts and differences between streaming and 
batch processing, and they do not have to directly
+maintain Flink streaming or batch jobs. All operations are done on 
Materialized tables, which can significantly accelerate ETL pipeline
+development.
+
+Here is an example to create a materialized table that is constantly refreshed 
with a data freshness of `3` minutes.
+
+```sql
+-- 1. Create table schema and data refresh pipeline
+CREATE MATERIALIZED TABLE dwd_orders
+(
+ PRIMARY KEY(ds, id) NOT ENFORCED
+)
+PARTITIONED BY (ds)
+FRESHNESS = INTERVAL '3' MINUTE
+AS SELECT 
+ o.ds
+ o.id,
+ o.order_number,
+ o.user_id,
+...
+FROM 
+ orders as o
+ LEFT JOIN products FOR SYSTEM_TIME AS OF proctime() AS prod
+ ON o.product_id = prod.id
+ LEFT JOIN order_pay AS pay
+ ON o.id = pay.order_id and o.ds = pay.ds;
+
+-- 2. Pause the data refresh pipeline
+ALTER MATERIALIZED TABLE dwd_orders SUSPEND;
+
+-- 3. Resume the data refresh pipeline
+ALTER MATERIALIZED TABLE dwd_orders RESUME
+-- Set table option via WITH clause
+WITH(
+ 'sink.parallesim' = '10'
+);
+
+-- Refresh historical partition manually
+ALTER MATERIALIZED TABLE dwd_orders REFRESH PARTITION(ds='20231023');
+```
+
+**More Information**
+* [FLINK-35187](https://issues.apache.org/jira/browse/FLINK-35187)
+* 
[FLIP-435](https://cwiki.apache.org/confluence/display/FLINK/FLIP-435%3A+Introduce+a+New+Materialized+Table+for+Simplifying+Data+Pipelines)
+* [Materialized Table 
Overview](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/table/materialized-table/overview/)
+
+
+## Introduce Catalog-Related Syntax
+
+With the growing adoption of Flink SQL, implementations of Flink's `Catalog` 
interface play an increasingly important role. Today, Flink features a JDBC and 
a Hive catalog implementation and other open source projects such as Apache 
Paimon integrate with this interface as well.
+
+Now in Flink 1.20, you can use the `DQL` syntax to obtain detailed metadata 
from existing catalogs, and the
+`DDL` syntax to modify metadata such as properties or comment in the specified 
catalog.
+
+```sql
+Flink SQL> CREATE CATALOG `cat` WITH ('type'='generic_in_memory', 
'default-database'='db');
+[INFO] Execute statement succeeded.
+
+Flink SQL> SHOW CREATE CATALOG `cat`;
++---------------------------------------------------------------------------------------------+
+|                                                                              
        result |
++---------------------------------------------------------------------------------------------+
+| CREATE CATALOG `cat` WITH (
+  'default-database' = 'db',
+  'type' = 'generic_in_memory'
+)
+|
++---------------------------------------------------------------------------------------------+
+1 row in set
+
+Flink SQL> DESCRIBE CATALOG `cat`;
++-----------+-------------------+
+| info name |        info value |
++-----------+-------------------+
+|      name |               cat |
+|      type | generic_in_memory |
+|   comment |                   |
++-----------+-------------------+
+3 rows in set
+
+Flink SQL> ALTER CATALOG `cat` SET ('default-database'='new-db');
+[INFO] Execute statement succeeded.
+
+Flink SQL> SHOW CREATE CATALOG `cat`;
++-------------------------------------------------------------------------------------------------+
+|                                                                              
            result |
++-------------------------------------------------------------------------------------------------+
+| CREATE CATALOG `cat` WITH (
+  'default-database' = 'new-db',
+  'type' = 'generic_in_memory'
+)
+|
++-------------------------------------------------------------------------------------------------+
+1 row in set
+```
+
+**More Information**
+* [FLINK-34914](https://issues.apache.org/jira/browse/FLINK-34914)
+* 
[FLIP-436](https://cwiki.apache.org/confluence/display/FLINK/FLIP-436%3A+Introduce+Catalog-related+Syntax)
+
+
+## Add DISTRIBUTED BY Clause
+
+Many SQL engines expose the concepts of `Partitioning`, `Bucketing`, or 
`Clustering`. We propose to introduce
+the concept of `Bucketing` to Flink.
+
+Buckets enable load balancing in an external storage system by splitting data 
into disjoint subsets. It depends
+heavily on the semantics of the underlying connector. However, a user can 
influence the bucketing behavior by
+specifying the number of buckets, the bucketing algorithm, and (if the 
algorithm allows it) the columns which 
+are used for target bucket calculation. All bucketing components (i.e. bucket 
number, distribution algorithm, bucket key columns)
+are optional from a SQL syntax perspective.
+
+Take the following SQL statements as an example:
+
+```sql
+-- declares a hash function on a fixed number of 4 buckets (i.e. HASH(uid) % 4 
= target bucket).
+CREATE TABLE MyTable (uid BIGINT, name STRING) DISTRIBUTED BY HASH(uid) INTO 4 
BUCKETS;
+
+-- leaves the selection of an algorithm up to the connector.
+CREATE TABLE MyTable (uid BIGINT, name STRING) DISTRIBUTED BY (uid) INTO 4 
BUCKETS;
+
+-- leaves the number of buckets up to the connector.
+CREATE TABLE MyTable (uid BIGINT, name STRING) DISTRIBUTED BY (uid);
+
+-- only defines the number of buckets.
+CREATE TABLE MyTable (uid BIGINT, name STRING) DISTRIBUTED INTO 4 BUCKETS;
+```
+
+**More Information**
+* [FLINK-33494](https://issues.apache.org/jira/browse/FLINK-33494)
+* 
[FLIP-376](https://cwiki.apache.org/confluence/display/FLINK/FLIP-376%3A+Add+DISTRIBUTED+BY+clause)
+* 
[Documentation](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/table/sql/create/#distributed)
+
+# State & Checkpoint Improvements
+
+## Unified File Merging Mechanism for Checkpoints
+
+The unified file merging mechanism for checkpointing is introduced to Flink 
1.20 as an MVP ("minimum viable product") feature.
+It combines multiple small checkpoint files to into fewer larger files, which 
reduces the number of file creation
+and file deletion operations and alleviates the pressure of file system 
metadata management during checkpoints.
+
+The mechanism can be enabled by setting 
`execution.checkpointing.file-merging.enabled` to `true`. For more advanced 
options
+and principle behind this feature, please refer to the Checkpointing 
documentation.
+
+**More Information**
+* [FLINK-33494](https://issues.apache.org/jira/browse/FLINK-32070)
+* 
[FLIP-306](https://cwiki.apache.org/confluence/display/FLINK/FLIP-306%3A+Unified+File+Merging+Mechanism+for+Checkpoints)
+* 
[Documentation](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/datastream/fault-tolerance/checkpointing/#unify-file-merging-mechanism-for-checkpoints-experimental)
+
+## Compaction of Small SST Files
+
+In some cases, the number of files produced by the RocksDB state backend grows 
indefinitely. In addition to the overhead caused by many small files,
+this behavior can cause the task state info to exceed the RPC message size 
limit and therefore lead to recovery or checkpoint failures.
+
+From release 1.20 on, Flink can merge such files in the background using the 
RocksDB API.
+
+**More Information**
+* [FLINK-26050](https://issues.apache.org/jira/browse/FLINK-26050)
+
+## Reorganizing all State & Checkpointing Configuration Options
+
+In Flink 1.20, all the options about state and checkpointing are reorganized 
and categorized by prefixes as listed below:
+1. `execution.checkpointing.*`: all configuration options associated with 
checkpointing and savepoints.
+2. `execution.state-recovery.*`: all configuration options related to state 
recovery.
+3. `state.*`: all configuration options related to the state accessing.
+   a. `state.backend.*`: configuration options for individual state backends, 
such as RocksDB.
+   b. `state.changelog.*`: configuration options for the state changelog, as 
outlined in FLIP-158, including the options for the "Durable Short-term Log" 
(DSTL).
+   c. `state.latency-track.*`: configuration options related to the latency 
tracking of state access.

Review Comment:
   I think there are more configuration related changes. Maybe we should add a 
dedicated section for that, rather than separating configuration changes into 
sections of each component.



##########
docs/content/posts/2024-07-15-release-1.20.0.md:
##########
@@ -0,0 +1,560 @@
+---
+authors:
+- reswqa:
+  name: "Weijie Guo"
+  twitter: "WeijieGuo12"
+- 1996fanrui:
+  name: "Rui Fan"
+  twitter: "1996fanrui"
+
+date: "2024-07-15T08:00:00Z"
+subtitle: ""
+title: Announcing the Release of Apache Flink 1.20
+aliases:
+- /news/2024/07/15/release-1.20.0.html
+---
+
+The Apache Flink PMC is pleased to announce the release of Apache Flink 
1.20.0. As usual, we are
+looking at a packed release with a wide variety of improvements and new 
features. Overall, 142
+people contributed to this release completing 13 FLIPs and 300+ issues. Thank 
you!
+
+Let's dive into the highlights.
+
+# Standing on the Eve of Apache Flink 2.0
+
+Flink 1.0 was released seven years ago. Since several months, the community is 
actively planning and taking steps towards
+the next major release. The new 1.20 release is planned to be the last minor 
release before Flink 2.0, which is anticipated by the end of 2024.
+
+Start from Flink 1.19, the community has decided to officially deprecate 
multiple APIs that were approaching 
+end of life for a while. In 1.20, we further sorted through all relevant APIs 
that might need to be replaced
+or deprecated to clear the way for the 2.0 release:
+- Configuration Improvements: As Flink moves towards version 2.0, we have 
revisited all runtime & Table API/SQL 
+configuration options and identified several opportunities to enhance 
user-friendliness and maintainability.
+- Deprecate the Legacy 
[SinkFunction](https://nightlies.apache.org/flink/flink-docs-release-1.20/api/java/org/apache/flink/streaming/api/functions/sink/SinkFunction.html)
 API: Since its introduction in Flink 1.12, the [Unified Sink 
API](https://nightlies.apache.org/flink/flink-docs-release-1.20/api/java/org/apache/flink/api/connector/sink2/Sink.html)
+has undergone extensive development and testing. Over multiple release cycles, 
the API has demonstrated
+stability and robustness, aligning with the criteria set forth in 
[FLIP-197](https://cwiki.apache.org/confluence/display/FLINK/FLIP-197%3A+API+stability+graduation+process)
 for API stability graduation.
+Therefore, we promote the Unified Sink API v2 to `@Public` and deprecate the 
legacy `SinkFunction` interface.
+
+It has been seven years since the Flink community's last major release, and we 
have great expectations for Flink 2.0.
+We are planning to release several high-impact features in `2.x`. Some of them 
are already introduced in Flink 1.20 in MVP (minimum viable product) state and 
discussed in more detail below.
+- Introduce a New Materialized Table for Simplifying Data Pipelines: 
[FLIP-435](https://cwiki.apache.org/confluence/display/FLINK/FLIP-435%3A+Introduce+a+New+Materialized+Table+for+Simplifying+Data+Pipelines)
 designed to simplify the development of
+data processing pipelines. With dynamic table with uniform SQL statements and 
freshness, users can define batch
+and streaming transformations to data in the same way, accelerate ETL pipeline 
development, and manage task scheduling
+automatically. See below for more details on this exciting feature.
+- Unified File Merging Mechanism for Checkpoints: The unified file merging 
mechanism for checkpointing is introduced to
+Flink 1.20 as an MVP feature, which allows scattered small checkpoint files to 
be written into larger files, reducing
+the number of file creations and file deletions and alleviating the pressure 
of file system metadata management raised by
+the file flooding problem during checkpoints.
+
+# Flink SQL Improvements
+
+## Introduce Materialized Tables
+
+We introduced Materialized Tables abstraction in Flink SQL, a new table type 
designed to simplify both batch and stream
+data pipelines while providing a consistent development experience.
+
+Materialized tables are defined with a query and a data freshness 
specification. The engine automatically derives the table

Review Comment:
   @davidradl Materialized tables are not always batch jobs. Users define how 
the table should be derived / calculated from other source tables with SQL, and 
an additional freshness. Flink will automatically decide how the job should be 
executed (in streaming or batch mode) based on the freshness.



##########
docs/content/posts/2024-07-15-release-1.20.0.md:
##########
@@ -0,0 +1,560 @@
+---
+authors:
+- reswqa:
+  name: "Weijie Guo"
+  twitter: "WeijieGuo12"
+- 1996fanrui:
+  name: "Rui Fan"
+  twitter: "1996fanrui"
+
+date: "2024-07-15T08:00:00Z"
+subtitle: ""
+title: Announcing the Release of Apache Flink 1.20
+aliases:
+- /news/2024/07/15/release-1.20.0.html
+---
+
+The Apache Flink PMC is pleased to announce the release of Apache Flink 
1.20.0. As usual, we are
+looking at a packed release with a wide variety of improvements and new 
features. Overall, 142
+people contributed to this release completing 13 FLIPs and 300+ issues. Thank 
you!
+
+Let's dive into the highlights.
+
+# Standing on the Eve of Apache Flink 2.0
+
+Flink 1.0 was released seven years ago. Since several months, the community is 
actively planning and taking steps towards
+the next major release. The new 1.20 release is planned to be the last minor 
release before Flink 2.0, which is anticipated by the end of 2024.
+
+Start from Flink 1.19, the community has decided to officially deprecate 
multiple APIs that were approaching 
+end of life for a while. In 1.20, we further sorted through all relevant APIs 
that might need to be replaced
+or deprecated to clear the way for the 2.0 release:
+- Configuration Improvements: As Flink moves towards version 2.0, we have 
revisited all runtime & Table API/SQL 
+configuration options and identified several opportunities to enhance 
user-friendliness and maintainability.
+- Deprecate the Legacy 
[SinkFunction](https://nightlies.apache.org/flink/flink-docs-release-1.20/api/java/org/apache/flink/streaming/api/functions/sink/SinkFunction.html)
 API: Since its introduction in Flink 1.12, the [Unified Sink 
API](https://nightlies.apache.org/flink/flink-docs-release-1.20/api/java/org/apache/flink/api/connector/sink2/Sink.html)
+has undergone extensive development and testing. Over multiple release cycles, 
the API has demonstrated
+stability and robustness, aligning with the criteria set forth in 
[FLIP-197](https://cwiki.apache.org/confluence/display/FLINK/FLIP-197%3A+API+stability+graduation+process)
 for API stability graduation.
+Therefore, we promote the Unified Sink API v2 to `@Public` and deprecate the 
legacy `SinkFunction` interface.
+
+It has been seven years since the Flink community's last major release, and we 
have great expectations for Flink 2.0.
+We are planning to release several high-impact features in `2.x`. Some of them 
are already introduced in Flink 1.20 in MVP (minimum viable product) state and 
discussed in more detail below.
+- Introduce a New Materialized Table for Simplifying Data Pipelines: 
[FLIP-435](https://cwiki.apache.org/confluence/display/FLINK/FLIP-435%3A+Introduce+a+New+Materialized+Table+for+Simplifying+Data+Pipelines)
 designed to simplify the development of
+data processing pipelines. With dynamic table with uniform SQL statements and 
freshness, users can define batch
+and streaming transformations to data in the same way, accelerate ETL pipeline 
development, and manage task scheduling
+automatically. See below for more details on this exciting feature.
+- Unified File Merging Mechanism for Checkpoints: The unified file merging 
mechanism for checkpointing is introduced to
+Flink 1.20 as an MVP feature, which allows scattered small checkpoint files to 
be written into larger files, reducing
+the number of file creations and file deletions and alleviating the pressure 
of file system metadata management raised by
+the file flooding problem during checkpoints.
+
+# Flink SQL Improvements
+
+## Introduce Materialized Tables
+
+We introduced Materialized Tables abstraction in Flink SQL, a new table type 
designed to simplify both batch and stream
+data pipelines while providing a consistent development experience.
+
+Materialized tables are defined with a query and a data freshness 
specification. The engine automatically derives the table
+schema and creates a data refresh pipeline to maintain the query result with 
the requested freshness. Users are relieved from
+the burden of comprehending the concepts and differences between streaming and 
batch processing, and they do not have to directly
+maintain Flink streaming or batch jobs. All operations are done on 
Materialized tables, which can significantly accelerate ETL pipeline
+development.
+
+Here is an example to create a materialized table that is constantly refreshed 
with a data freshness of `3` minutes.
+
+```sql
+-- 1. Create table schema and data refresh pipeline
+CREATE MATERIALIZED TABLE dwd_orders
+(
+ PRIMARY KEY(ds, id) NOT ENFORCED
+)
+PARTITIONED BY (ds)
+FRESHNESS = INTERVAL '3' MINUTE
+AS SELECT 
+ o.ds
+ o.id,
+ o.order_number,
+ o.user_id,
+...
+FROM 
+ orders as o
+ LEFT JOIN products FOR SYSTEM_TIME AS OF proctime() AS prod
+ ON o.product_id = prod.id
+ LEFT JOIN order_pay AS pay
+ ON o.id = pay.order_id and o.ds = pay.ds;
+
+-- 2. Pause the data refresh pipeline
+ALTER MATERIALIZED TABLE dwd_orders SUSPEND;
+
+-- 3. Resume the data refresh pipeline
+ALTER MATERIALIZED TABLE dwd_orders RESUME
+-- Set table option via WITH clause
+WITH(
+ 'sink.parallesim' = '10'
+);
+
+-- Refresh historical partition manually
+ALTER MATERIALIZED TABLE dwd_orders REFRESH PARTITION(ds='20231023');
+```
+
+**More Information**
+* [FLINK-35187](https://issues.apache.org/jira/browse/FLINK-35187)
+* 
[FLIP-435](https://cwiki.apache.org/confluence/display/FLINK/FLIP-435%3A+Introduce+a+New+Materialized+Table+for+Simplifying+Data+Pipelines)
+* [Materialized Table 
Overview](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/table/materialized-table/overview/)
+
+
+## Introduce Catalog-Related Syntax
+
+With the growing adoption of Flink SQL, implementations of Flink's `Catalog` 
interface play an increasingly important role. Today, Flink features a JDBC and 
a Hive catalog implementation and other open source projects such as Apache 
Paimon integrate with this interface as well.
+
+Now in Flink 1.20, you can use the `DQL` syntax to obtain detailed metadata 
from existing catalogs, and the
+`DDL` syntax to modify metadata such as properties or comment in the specified 
catalog.
+
+```sql
+Flink SQL> CREATE CATALOG `cat` WITH ('type'='generic_in_memory', 
'default-database'='db');
+[INFO] Execute statement succeeded.
+
+Flink SQL> SHOW CREATE CATALOG `cat`;
++---------------------------------------------------------------------------------------------+
+|                                                                              
        result |
++---------------------------------------------------------------------------------------------+
+| CREATE CATALOG `cat` WITH (
+  'default-database' = 'db',
+  'type' = 'generic_in_memory'
+)
+|
++---------------------------------------------------------------------------------------------+
+1 row in set
+
+Flink SQL> DESCRIBE CATALOG `cat`;
++-----------+-------------------+
+| info name |        info value |
++-----------+-------------------+
+|      name |               cat |
+|      type | generic_in_memory |
+|   comment |                   |
++-----------+-------------------+
+3 rows in set
+
+Flink SQL> ALTER CATALOG `cat` SET ('default-database'='new-db');
+[INFO] Execute statement succeeded.
+
+Flink SQL> SHOW CREATE CATALOG `cat`;
++-------------------------------------------------------------------------------------------------+
+|                                                                              
            result |
++-------------------------------------------------------------------------------------------------+
+| CREATE CATALOG `cat` WITH (
+  'default-database' = 'new-db',
+  'type' = 'generic_in_memory'
+)
+|
++-------------------------------------------------------------------------------------------------+
+1 row in set
+```
+
+**More Information**
+* [FLINK-34914](https://issues.apache.org/jira/browse/FLINK-34914)
+* 
[FLIP-436](https://cwiki.apache.org/confluence/display/FLINK/FLIP-436%3A+Introduce+Catalog-related+Syntax)
+
+
+## Add DISTRIBUTED BY Clause
+
+Many SQL engines expose the concepts of `Partitioning`, `Bucketing`, or 
`Clustering`. We propose to introduce
+the concept of `Bucketing` to Flink.
+
+Buckets enable load balancing in an external storage system by splitting data 
into disjoint subsets. It depends
+heavily on the semantics of the underlying connector. However, a user can 
influence the bucketing behavior by
+specifying the number of buckets, the bucketing algorithm, and (if the 
algorithm allows it) the columns which 
+are used for target bucket calculation. All bucketing components (i.e. bucket 
number, distribution algorithm, bucket key columns)
+are optional from a SQL syntax perspective.
+
+Take the following SQL statements as an example:
+
+```sql
+-- declares a hash function on a fixed number of 4 buckets (i.e. HASH(uid) % 4 
= target bucket).
+CREATE TABLE MyTable (uid BIGINT, name STRING) DISTRIBUTED BY HASH(uid) INTO 4 
BUCKETS;
+
+-- leaves the selection of an algorithm up to the connector.
+CREATE TABLE MyTable (uid BIGINT, name STRING) DISTRIBUTED BY (uid) INTO 4 
BUCKETS;
+
+-- leaves the number of buckets up to the connector.
+CREATE TABLE MyTable (uid BIGINT, name STRING) DISTRIBUTED BY (uid);
+
+-- only defines the number of buckets.
+CREATE TABLE MyTable (uid BIGINT, name STRING) DISTRIBUTED INTO 4 BUCKETS;
+```
+
+**More Information**
+* [FLINK-33494](https://issues.apache.org/jira/browse/FLINK-33494)
+* 
[FLIP-376](https://cwiki.apache.org/confluence/display/FLINK/FLIP-376%3A+Add+DISTRIBUTED+BY+clause)
+* 
[Documentation](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/table/sql/create/#distributed)
+
+# State & Checkpoint Improvements
+
+## Unified File Merging Mechanism for Checkpoints
+
+The unified file merging mechanism for checkpointing is introduced to Flink 
1.20 as an MVP ("minimum viable product") feature.
+It combines multiple small checkpoint files to into fewer larger files, which 
reduces the number of file creation
+and file deletion operations and alleviates the pressure of file system 
metadata management during checkpoints.
+
+The mechanism can be enabled by setting 
`execution.checkpointing.file-merging.enabled` to `true`. For more advanced 
options
+and principle behind this feature, please refer to the Checkpointing 
documentation.
+
+**More Information**
+* [FLINK-33494](https://issues.apache.org/jira/browse/FLINK-32070)
+* 
[FLIP-306](https://cwiki.apache.org/confluence/display/FLINK/FLIP-306%3A+Unified+File+Merging+Mechanism+for+Checkpoints)
+* 
[Documentation](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/datastream/fault-tolerance/checkpointing/#unify-file-merging-mechanism-for-checkpoints-experimental)
+
+## Compaction of Small SST Files
+
+In some cases, the number of files produced by the RocksDB state backend grows 
indefinitely. In addition to the overhead caused by many small files,
+this behavior can cause the task state info to exceed the RPC message size 
limit and therefore lead to recovery or checkpoint failures.
+
+From release 1.20 on, Flink can merge such files in the background using the 
RocksDB API.
+
+**More Information**
+* [FLINK-26050](https://issues.apache.org/jira/browse/FLINK-26050)
+
+## Reorganizing all State & Checkpointing Configuration Options
+
+In Flink 1.20, all the options about state and checkpointing are reorganized 
and categorized by prefixes as listed below:
+1. `execution.checkpointing.*`: all configuration options associated with 
checkpointing and savepoints.
+2. `execution.state-recovery.*`: all configuration options related to state 
recovery.
+3. `state.*`: all configuration options related to the state accessing.
+   a. `state.backend.*`: configuration options for individual state backends, 
such as RocksDB.
+   b. `state.changelog.*`: configuration options for the state changelog, as 
outlined in FLIP-158, including the options for the "Durable Short-term Log" 
(DSTL).
+   c. `state.latency-track.*`: configuration options related to the latency 
tracking of state access.
+
+At the meantime, all the original options scattered everywhere are annotated 
as `@Deprecated`. For the detailed list of options,
+please refer to the documents below.
+
+**More Information**
+* 
[Checkpointing](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/deployment/config/#checkpointing)
+* 
[Recovery](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/deployment/config/#recovery)
+* [State 
Backend](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/deployment/config/#state-backends)
+* [State 
Changelog](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/deployment/config/#state-changelog-options)
+* 
[Latency-track](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/deployment/config/#state-latency-tracking-options)
+
+# Batch Processing Improvements
+
+## Support Job Recovery from JobMaster Failures for Batch Jobs
+
+In Flink 1.20, we introduced a batch job recovery mechanism to enable batch 
jobs to recover as much progress as possible 
+after a `JobMaster` failover, avoiding the need to rerun tasks that have 
already been finished.
+
+**More Information**
+* [FLINK-33892](https://issues.apache.org/jira/browse/FLINK-33892)
+* 
[FLIP-383](https://cwiki.apache.org/confluence/display/FLINK/FLIP-383%3A+Support+Job+Recovery+from+JobMaster+Failures+for+Batch+Jobs)
+* 
[Documentation](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/ops/batch/recovery_from_job_master_failure/)
+
+## Support Dynamic Parallelism Inference for HiveSource
+
+In Flink 1.20, we added support for dynamic source parallelism inference in 
batch jobs for the `Hive` source connector. 
+This allows the connector to dynamically determine parallelism based on the 
actual partitions with dynamic partition pruning. 
+Additionally, we have introduced a new configuration option - 
`table.exec.hive.infer-source-parallelism.mode` to enable users
+to choose between static and dynamic inference modes for source parallelism.
+
+It should be noted that in Flink 1.20, the previous configuration option 
`table.exec.hive.infer-source-parallelism` has been 
+marked as deprecated, but it will continue to serve as a switch for automatic 
parallelism inference until it is fully phased out.
+
+**More Information**
+* [FLINK-35293](https://issues.apache.org/jira/browse/FLINK-35293)
+* 
[FLIP-445](https://cwiki.apache.org/confluence/display/FLINK/FLIP-445%3A+Support+dynamic+parallelism+inference+for+HiveSource)
+
+# DataStream API Improvements
+
+The `DataSet` API has been already formally deprecated and will be removed in 
the Flink 2.0 version. 
+Flink users are recommended to migrate from the `DataSet` API to the 
`DataStream` API, `Table` API and `SQL` for their data 
+processing requirements.
+
+## Support Full Partition Processing on DataStream API
+
+Before 1.20, the `DataStream` API did not directly support aggregations on 
non-keyed streams (subtask-scope aggregations). 
+As a workaround, users could assign the subtask id to the records before 
turning the stream into a keyed stream which incurred additional overhead.
+Flink 1.20 adds built-in support for these operations via the 
`FullPartitionWindow` API.
+
+Suppose we want to count the number of records in each partition and output to 
a downstream operator. This can be done as follows:
+
+```java
+inputStream.fullWindowPartition()
+                .mapPartition(
+         new MapPartitionFunction<Record, Long>() {
+                    @Override
+                    public void mapPartition(
+                            Iterable<Record> values, Collector<Long> out)
+                            throws Exception {
+                        long counter = 0;
+                        for (Record value : values) {
+                            counter++;
+                        }
+                        out.collect(counter));
+                    }
+          })
+```
+
+**More Information**
+* [FLINK-34543](https://issues.apache.org/jira/browse/FLINK-34543)
+* 
[FLIP-380](https://cwiki.apache.org/confluence/display/FLINK/FLIP-380%3A+Support+Full+Partition+Processing+On+Non-keyed+DataStream)
+
+# Important Configuration Changes for Flink 2.0
+
+As Apache Flink progresses to version 2.0, several configuration options are 
being changed or deprecated to 
+improve ease-of-use and maintainability.
+
+## Using Proper Type for Configuration Options
+
+Some configuration options like `client.heartbeat.interval` has been updated 
to the `Duration` type in a backward-compatible manner. 
+The full list is available in 
[FLINK-35359](https://issues.apache.org/jira/browse/FLINK-35359).
+
+The following configuration options have been updated to the `Enum` type in a 
backward-compatible manner:
+- `taskmanager.network.compression.codec`
+- `table.optimizer.agg-phase-strategy`
+
+The following configuration options have been updated to the `Int` type in a 
backward-compatible manner:
+- `yarn.application-attempts`
+
+**More Information**
+* [FLINK-35359](https://issues.apache.org/jira/browse/FLINK-35359)
+
+## Deprecate Multiple Configuration Options
+
+In preparation for the release of Flink 2.0, the community has decided to 
officially deprecate
+multiple configuration options that were approaching end of life for a while.
+
+The following configuration options have been deprecated as we are phasing out 
the hash-based blocking shuffle:
+- `taskmanager.network.sort-shuffle.min-parallelism`
+- `taskmanager.network.blocking-shuffle.type`
+
+The following configuration options have been deprecated as we are phasing out 
the legacy hybrid shuffle:
+- `taskmanager.network.hybrid-shuffle.spill-index-region-group-size`
+- `taskmanager.network.hybrid-shuffle.num-retained-in-memory-regions-max`
+- `taskmanager.network.hybrid-shuffle.enable-new-mode`
+
+The following configuration options have been deprecated to simply the 
configuration of network buffers:
+- `taskmanager.network.memory.buffers-per-channel`
+- `taskmanager.network.memory.floating-buffers-per-gate`
+- `taskmanager.network.memory.max-buffers-per-channel`
+- `taskmanager.network.memory.max-overdraft-buffers-per-gate`
+- `taskmanager.network.memory.exclusive-buffers-request-timeout-ms` (Please 
use `taskmanager.network.memory.buffers-request-timeout` instead.)
+
+The configuration option 
`taskmanager.network.batch-shuffle.compression.enabled` has been deprecated. 
Please set `taskmanager.network.compression.codec` to `NONE` to disable 
compression.
+
+The following Netty-related configuration options are no longer recommended 
for use and have been deprecated:
+- `taskmanager.network.netty.num-arenas`
+- `taskmanager.network.netty.server.numThreads`
+- `taskmanager.network.netty.client.numThreads`
+- `taskmanager.network.netty.server.backlog`
+- `taskmanager.network.netty.sendReceiveBufferSize`
+- `taskmanager.network.netty.transport`
+
+The following configuration options are unnecessary and have been deprecated:
+- `taskmanager.network.max-num-tcp-connections`
+- `fine-grained.shuffle-mode.all-blocking`
+
+These options were previously used for fine-tuning TPC testing but are no 
longer needed by the current Flink planner:
+- `table.exec.range-sort.enabled`
+- `table.optimizer.rows-per-local-agg`
+- `table.optimizer.join.null-filter-threshold`
+- `table.optimizer.semi-anti-join.build-distinct.ndv-ratio`
+- `table.optimizer.shuffle-by-partial-key-enabled`
+- `table.optimizer.smj.remove-sort-enabled`
+- `table.optimizer.cnf-nodes-limit`
+
+These options were introduced for the now-obsolete `FilterableTableSource` 
interface:
+- `table.optimizer.source.aggregate-pushdown-enabled`
+- `table.optimizer.source.predicate-pushdown-enabled`
+
+**More Information**
+* [FLINK-35461](https://issues.apache.org/jira/browse/FLINK-35461)
+* [FLINK-35473](https://issues.apache.org/jira/browse/FLINK-35473)
+
+## New and Updated Configuration Options

Review Comment:
   It is quite obvious that the configuration changes in this release notes are 
provided by multiple people, as they are presented in different format / 
layouts. Maybe we should make them more consistent, so that users see the 
release notes as a unified documentation provided by the flink community, 
instead simply throwing multiple pieces of text together.



##########
docs/content/posts/2024-07-15-release-1.20.0.md:
##########
@@ -0,0 +1,560 @@
+---
+authors:
+- reswqa:
+  name: "Weijie Guo"
+  twitter: "WeijieGuo12"
+- 1996fanrui:
+  name: "Rui Fan"
+  twitter: "1996fanrui"
+
+date: "2024-07-15T08:00:00Z"
+subtitle: ""
+title: Announcing the Release of Apache Flink 1.20
+aliases:
+- /news/2024/07/15/release-1.20.0.html
+---
+
+The Apache Flink PMC is pleased to announce the release of Apache Flink 
1.20.0. As usual, we are
+looking at a packed release with a wide variety of improvements and new 
features. Overall, 142
+people contributed to this release completing 13 FLIPs and 300+ issues. Thank 
you!
+
+Let's dive into the highlights.
+
+# Standing on the Eve of Apache Flink 2.0
+
+Flink 1.0 was released seven years ago. Since several months, the community is 
actively planning and taking steps towards
+the next major release. The new 1.20 release is planned to be the last minor 
release before Flink 2.0, which is anticipated by the end of 2024.
+
+Start from Flink 1.19, the community has decided to officially deprecate 
multiple APIs that were approaching 
+end of life for a while. In 1.20, we further sorted through all relevant APIs 
that might need to be replaced
+or deprecated to clear the way for the 2.0 release:
+- Configuration Improvements: As Flink moves towards version 2.0, we have 
revisited all runtime & Table API/SQL 
+configuration options and identified several opportunities to enhance 
user-friendliness and maintainability.
+- Deprecate the Legacy 
[SinkFunction](https://nightlies.apache.org/flink/flink-docs-release-1.20/api/java/org/apache/flink/streaming/api/functions/sink/SinkFunction.html)
 API: Since its introduction in Flink 1.12, the [Unified Sink 
API](https://nightlies.apache.org/flink/flink-docs-release-1.20/api/java/org/apache/flink/api/connector/sink2/Sink.html)
+has undergone extensive development and testing. Over multiple release cycles, 
the API has demonstrated
+stability and robustness, aligning with the criteria set forth in 
[FLIP-197](https://cwiki.apache.org/confluence/display/FLINK/FLIP-197%3A+API+stability+graduation+process)
 for API stability graduation.
+Therefore, we promote the Unified Sink API v2 to `@Public` and deprecate the 
legacy `SinkFunction` interface.
+
+It has been seven years since the Flink community's last major release, and we 
have great expectations for Flink 2.0.
+We are planning to release several high-impact features in `2.x`. Some of them 
are already introduced in Flink 1.20 in MVP (minimum viable product) state and 
discussed in more detail below.
+- Introduce a New Materialized Table for Simplifying Data Pipelines: 
[FLIP-435](https://cwiki.apache.org/confluence/display/FLINK/FLIP-435%3A+Introduce+a+New+Materialized+Table+for+Simplifying+Data+Pipelines)
 designed to simplify the development of
+data processing pipelines. With dynamic table with uniform SQL statements and 
freshness, users can define batch
+and streaming transformations to data in the same way, accelerate ETL pipeline 
development, and manage task scheduling
+automatically. See below for more details on this exciting feature.
+- Unified File Merging Mechanism for Checkpoints: The unified file merging 
mechanism for checkpointing is introduced to
+Flink 1.20 as an MVP feature, which allows scattered small checkpoint files to 
be written into larger files, reducing
+the number of file creations and file deletions and alleviating the pressure 
of file system metadata management raised by
+the file flooding problem during checkpoints.
+
+# Flink SQL Improvements
+
+## Introduce Materialized Tables
+
+We introduced Materialized Tables abstraction in Flink SQL, a new table type 
designed to simplify both batch and stream
+data pipelines while providing a consistent development experience.
+
+Materialized tables are defined with a query and a data freshness 
specification. The engine automatically derives the table
+schema and creates a data refresh pipeline to maintain the query result with 
the requested freshness. Users are relieved from
+the burden of comprehending the concepts and differences between streaming and 
batch processing, and they do not have to directly
+maintain Flink streaming or batch jobs. All operations are done on 
Materialized tables, which can significantly accelerate ETL pipeline
+development.
+
+Here is an example to create a materialized table that is constantly refreshed 
with a data freshness of `3` minutes.
+
+```sql
+-- 1. Create table schema and data refresh pipeline
+CREATE MATERIALIZED TABLE dwd_orders
+(
+ PRIMARY KEY(ds, id) NOT ENFORCED
+)
+PARTITIONED BY (ds)
+FRESHNESS = INTERVAL '3' MINUTE
+AS SELECT 
+ o.ds
+ o.id,
+ o.order_number,
+ o.user_id,
+...
+FROM 
+ orders as o
+ LEFT JOIN products FOR SYSTEM_TIME AS OF proctime() AS prod
+ ON o.product_id = prod.id
+ LEFT JOIN order_pay AS pay
+ ON o.id = pay.order_id and o.ds = pay.ds;
+
+-- 2. Pause the data refresh pipeline
+ALTER MATERIALIZED TABLE dwd_orders SUSPEND;
+
+-- 3. Resume the data refresh pipeline
+ALTER MATERIALIZED TABLE dwd_orders RESUME
+-- Set table option via WITH clause
+WITH(
+ 'sink.parallesim' = '10'
+);
+
+-- Refresh historical partition manually
+ALTER MATERIALIZED TABLE dwd_orders REFRESH PARTITION(ds='20231023');
+```
+
+**More Information**
+* [FLINK-35187](https://issues.apache.org/jira/browse/FLINK-35187)
+* 
[FLIP-435](https://cwiki.apache.org/confluence/display/FLINK/FLIP-435%3A+Introduce+a+New+Materialized+Table+for+Simplifying+Data+Pipelines)
+* [Materialized Table 
Overview](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/table/materialized-table/overview/)
+
+
+## Introduce Catalog-Related Syntax
+
+With the growing adoption of Flink SQL, implementations of Flink's `Catalog` 
interface play an increasingly important role. Today, Flink features a JDBC and 
a Hive catalog implementation and other open source projects such as Apache 
Paimon integrate with this interface as well.
+
+Now in Flink 1.20, you can use the `DQL` syntax to obtain detailed metadata 
from existing catalogs, and the
+`DDL` syntax to modify metadata such as properties or comment in the specified 
catalog.
+
+```sql
+Flink SQL> CREATE CATALOG `cat` WITH ('type'='generic_in_memory', 
'default-database'='db');
+[INFO] Execute statement succeeded.
+
+Flink SQL> SHOW CREATE CATALOG `cat`;
++---------------------------------------------------------------------------------------------+
+|                                                                              
        result |
++---------------------------------------------------------------------------------------------+
+| CREATE CATALOG `cat` WITH (
+  'default-database' = 'db',
+  'type' = 'generic_in_memory'
+)
+|
++---------------------------------------------------------------------------------------------+
+1 row in set
+
+Flink SQL> DESCRIBE CATALOG `cat`;
++-----------+-------------------+
+| info name |        info value |
++-----------+-------------------+
+|      name |               cat |
+|      type | generic_in_memory |
+|   comment |                   |
++-----------+-------------------+
+3 rows in set
+
+Flink SQL> ALTER CATALOG `cat` SET ('default-database'='new-db');
+[INFO] Execute statement succeeded.
+
+Flink SQL> SHOW CREATE CATALOG `cat`;
++-------------------------------------------------------------------------------------------------+
+|                                                                              
            result |
++-------------------------------------------------------------------------------------------------+
+| CREATE CATALOG `cat` WITH (
+  'default-database' = 'new-db',
+  'type' = 'generic_in_memory'
+)
+|
++-------------------------------------------------------------------------------------------------+
+1 row in set
+```
+
+**More Information**
+* [FLINK-34914](https://issues.apache.org/jira/browse/FLINK-34914)
+* 
[FLIP-436](https://cwiki.apache.org/confluence/display/FLINK/FLIP-436%3A+Introduce+Catalog-related+Syntax)
+
+
+## Add DISTRIBUTED BY Clause
+
+Many SQL engines expose the concepts of `Partitioning`, `Bucketing`, or 
`Clustering`. We propose to introduce
+the concept of `Bucketing` to Flink.
+
+Buckets enable load balancing in an external storage system by splitting data 
into disjoint subsets. It depends
+heavily on the semantics of the underlying connector. However, a user can 
influence the bucketing behavior by
+specifying the number of buckets, the bucketing algorithm, and (if the 
algorithm allows it) the columns which 
+are used for target bucket calculation. All bucketing components (i.e. bucket 
number, distribution algorithm, bucket key columns)
+are optional from a SQL syntax perspective.
+
+Take the following SQL statements as an example:
+
+```sql
+-- declares a hash function on a fixed number of 4 buckets (i.e. HASH(uid) % 4 
= target bucket).
+CREATE TABLE MyTable (uid BIGINT, name STRING) DISTRIBUTED BY HASH(uid) INTO 4 
BUCKETS;
+
+-- leaves the selection of an algorithm up to the connector.
+CREATE TABLE MyTable (uid BIGINT, name STRING) DISTRIBUTED BY (uid) INTO 4 
BUCKETS;
+
+-- leaves the number of buckets up to the connector.
+CREATE TABLE MyTable (uid BIGINT, name STRING) DISTRIBUTED BY (uid);
+
+-- only defines the number of buckets.
+CREATE TABLE MyTable (uid BIGINT, name STRING) DISTRIBUTED INTO 4 BUCKETS;
+```
+
+**More Information**
+* [FLINK-33494](https://issues.apache.org/jira/browse/FLINK-33494)
+* 
[FLIP-376](https://cwiki.apache.org/confluence/display/FLINK/FLIP-376%3A+Add+DISTRIBUTED+BY+clause)
+* 
[Documentation](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/table/sql/create/#distributed)
+
+# State & Checkpoint Improvements
+
+## Unified File Merging Mechanism for Checkpoints
+
+The unified file merging mechanism for checkpointing is introduced to Flink 
1.20 as an MVP ("minimum viable product") feature.
+It combines multiple small checkpoint files to into fewer larger files, which 
reduces the number of file creation
+and file deletion operations and alleviates the pressure of file system 
metadata management during checkpoints.
+
+The mechanism can be enabled by setting 
`execution.checkpointing.file-merging.enabled` to `true`. For more advanced 
options
+and principle behind this feature, please refer to the Checkpointing 
documentation.
+
+**More Information**
+* [FLINK-33494](https://issues.apache.org/jira/browse/FLINK-32070)
+* 
[FLIP-306](https://cwiki.apache.org/confluence/display/FLINK/FLIP-306%3A+Unified+File+Merging+Mechanism+for+Checkpoints)
+* 
[Documentation](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/datastream/fault-tolerance/checkpointing/#unify-file-merging-mechanism-for-checkpoints-experimental)
+
+## Compaction of Small SST Files
+
+In some cases, the number of files produced by the RocksDB state backend grows 
indefinitely. In addition to the overhead caused by many small files,
+this behavior can cause the task state info to exceed the RPC message size 
limit and therefore lead to recovery or checkpoint failures.
+
+From release 1.20 on, Flink can merge such files in the background using the 
RocksDB API.
+
+**More Information**
+* [FLINK-26050](https://issues.apache.org/jira/browse/FLINK-26050)
+
+## Reorganizing all State & Checkpointing Configuration Options
+
+In Flink 1.20, all the options about state and checkpointing are reorganized 
and categorized by prefixes as listed below:
+1. `execution.checkpointing.*`: all configuration options associated with 
checkpointing and savepoints.
+2. `execution.state-recovery.*`: all configuration options related to state 
recovery.
+3. `state.*`: all configuration options related to the state accessing.
+   a. `state.backend.*`: configuration options for individual state backends, 
such as RocksDB.
+   b. `state.changelog.*`: configuration options for the state changelog, as 
outlined in FLIP-158, including the options for the "Durable Short-term Log" 
(DSTL).
+   c. `state.latency-track.*`: configuration options related to the latency 
tracking of state access.
+
+At the meantime, all the original options scattered everywhere are annotated 
as `@Deprecated`. For the detailed list of options,
+please refer to the documents below.
+
+**More Information**
+* 
[Checkpointing](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/deployment/config/#checkpointing)
+* 
[Recovery](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/deployment/config/#recovery)
+* [State 
Backend](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/deployment/config/#state-backends)
+* [State 
Changelog](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/deployment/config/#state-changelog-options)
+* 
[Latency-track](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/deployment/config/#state-latency-tracking-options)
+
+# Batch Processing Improvements
+
+## Support Job Recovery from JobMaster Failures for Batch Jobs
+
+In Flink 1.20, we introduced a batch job recovery mechanism to enable batch 
jobs to recover as much progress as possible 
+after a `JobMaster` failover, avoiding the need to rerun tasks that have 
already been finished.
+
+**More Information**
+* [FLINK-33892](https://issues.apache.org/jira/browse/FLINK-33892)
+* 
[FLIP-383](https://cwiki.apache.org/confluence/display/FLINK/FLIP-383%3A+Support+Job+Recovery+from+JobMaster+Failures+for+Batch+Jobs)
+* 
[Documentation](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/ops/batch/recovery_from_job_master_failure/)
+
+## Support Dynamic Parallelism Inference for HiveSource
+
+In Flink 1.20, we added support for dynamic source parallelism inference in 
batch jobs for the `Hive` source connector. 
+This allows the connector to dynamically determine parallelism based on the 
actual partitions with dynamic partition pruning. 
+Additionally, we have introduced a new configuration option - 
`table.exec.hive.infer-source-parallelism.mode` to enable users
+to choose between static and dynamic inference modes for source parallelism.
+
+It should be noted that in Flink 1.20, the previous configuration option 
`table.exec.hive.infer-source-parallelism` has been 
+marked as deprecated, but it will continue to serve as a switch for automatic 
parallelism inference until it is fully phased out.
+
+**More Information**
+* [FLINK-35293](https://issues.apache.org/jira/browse/FLINK-35293)
+* 
[FLIP-445](https://cwiki.apache.org/confluence/display/FLINK/FLIP-445%3A+Support+dynamic+parallelism+inference+for+HiveSource)
+
+# DataStream API Improvements
+
+The `DataSet` API has been already formally deprecated and will be removed in 
the Flink 2.0 version. 
+Flink users are recommended to migrate from the `DataSet` API to the 
`DataStream` API, `Table` API and `SQL` for their data 
+processing requirements.
+
+## Support Full Partition Processing on DataStream API
+
+Before 1.20, the `DataStream` API did not directly support aggregations on 
non-keyed streams (subtask-scope aggregations). 
+As a workaround, users could assign the subtask id to the records before 
turning the stream into a keyed stream which incurred additional overhead.
+Flink 1.20 adds built-in support for these operations via the 
`FullPartitionWindow` API.
+
+Suppose we want to count the number of records in each partition and output to 
a downstream operator. This can be done as follows:
+
+```java
+inputStream.fullWindowPartition()
+                .mapPartition(
+         new MapPartitionFunction<Record, Long>() {
+                    @Override
+                    public void mapPartition(
+                            Iterable<Record> values, Collector<Long> out)
+                            throws Exception {
+                        long counter = 0;
+                        for (Record value : values) {
+                            counter++;
+                        }
+                        out.collect(counter));
+                    }
+          })
+```
+
+**More Information**
+* [FLINK-34543](https://issues.apache.org/jira/browse/FLINK-34543)
+* 
[FLIP-380](https://cwiki.apache.org/confluence/display/FLINK/FLIP-380%3A+Support+Full+Partition+Processing+On+Non-keyed+DataStream)
+
+# Important Configuration Changes for Flink 2.0
+
+As Apache Flink progresses to version 2.0, several configuration options are 
being changed or deprecated to 
+improve ease-of-use and maintainability.
+
+## Using Proper Type for Configuration Options
+
+Some configuration options like `client.heartbeat.interval` has been updated 
to the `Duration` type in a backward-compatible manner. 
+The full list is available in 
[FLINK-35359](https://issues.apache.org/jira/browse/FLINK-35359).
+
+The following configuration options have been updated to the `Enum` type in a 
backward-compatible manner:
+- `taskmanager.network.compression.codec`
+- `table.optimizer.agg-phase-strategy`
+
+The following configuration options have been updated to the `Int` type in a 
backward-compatible manner:
+- `yarn.application-attempts`
+
+**More Information**
+* [FLINK-35359](https://issues.apache.org/jira/browse/FLINK-35359)
+
+## Deprecate Multiple Configuration Options
+
+In preparation for the release of Flink 2.0, the community has decided to 
officially deprecate
+multiple configuration options that were approaching end of life for a while.
+
+The following configuration options have been deprecated as we are phasing out 
the hash-based blocking shuffle:

Review Comment:
   +1 for mentioning the replacements



##########
docs/content/posts/2024-07-15-release-1.20.0.md:
##########
@@ -0,0 +1,560 @@
+---
+authors:
+- reswqa:
+  name: "Weijie Guo"
+  twitter: "WeijieGuo12"
+- 1996fanrui:
+  name: "Rui Fan"
+  twitter: "1996fanrui"
+
+date: "2024-07-15T08:00:00Z"
+subtitle: ""
+title: Announcing the Release of Apache Flink 1.20
+aliases:
+- /news/2024/07/15/release-1.20.0.html
+---
+
+The Apache Flink PMC is pleased to announce the release of Apache Flink 
1.20.0. As usual, we are
+looking at a packed release with a wide variety of improvements and new 
features. Overall, 142
+people contributed to this release completing 13 FLIPs and 300+ issues. Thank 
you!
+
+Let's dive into the highlights.
+
+# Standing on the Eve of Apache Flink 2.0
+
+Flink 1.0 was released seven years ago. Since several months, the community is 
actively planning and taking steps towards
+the next major release. The new 1.20 release is planned to be the last minor 
release before Flink 2.0, which is anticipated by the end of 2024.
+
+Start from Flink 1.19, the community has decided to officially deprecate 
multiple APIs that were approaching 
+end of life for a while. In 1.20, we further sorted through all relevant APIs 
that might need to be replaced
+or deprecated to clear the way for the 2.0 release:
+- Configuration Improvements: As Flink moves towards version 2.0, we have 
revisited all runtime & Table API/SQL 
+configuration options and identified several opportunities to enhance 
user-friendliness and maintainability.
+- Deprecate the Legacy 
[SinkFunction](https://nightlies.apache.org/flink/flink-docs-release-1.20/api/java/org/apache/flink/streaming/api/functions/sink/SinkFunction.html)
 API: Since its introduction in Flink 1.12, the [Unified Sink 
API](https://nightlies.apache.org/flink/flink-docs-release-1.20/api/java/org/apache/flink/api/connector/sink2/Sink.html)
+has undergone extensive development and testing. Over multiple release cycles, 
the API has demonstrated
+stability and robustness, aligning with the criteria set forth in 
[FLIP-197](https://cwiki.apache.org/confluence/display/FLINK/FLIP-197%3A+API+stability+graduation+process)
 for API stability graduation.
+Therefore, we promote the Unified Sink API v2 to `@Public` and deprecate the 
legacy `SinkFunction` interface.
+
+It has been seven years since the Flink community's last major release, and we 
have great expectations for Flink 2.0.
+We are planning to release several high-impact features in `2.x`. Some of them 
are already introduced in Flink 1.20 in MVP (minimum viable product) state and 
discussed in more detail below.

Review Comment:
   This aligns with the feature radar in the project roadmap.
   https://flink.apache.org/what-is-flink/roadmap/



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [FLINK-35847] Add 1.20 release announcement [flink-web]

Reply via email to