This is an automated email from the ASF dual-hosted git repository. lindong pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/flink-web.git
commit 400c5a7a43877b4d96b5ea6f4ae90497d2e0922c Author: Dong Lin <[email protected]> AuthorDate: Sun Apr 9 23:28:30 2023 +0800 Release Flink ML 2.2.0 --- docs/config.toml | 4 +- docs/content/posts/2023-04-19-release-ml-2.2.0.md | 120 ++++++++++++++++++++++ docs/data/flink_ml.yml | 7 ++ docs/data/release_archive.yml | 3 + 4 files changed, 132 insertions(+), 2 deletions(-) diff --git a/docs/config.toml b/docs/config.toml index 3af07ec24..fb003fda8 100644 --- a/docs/config.toml +++ b/docs/config.toml @@ -37,8 +37,8 @@ posts = "/:year/:month/:day/:title/" FlinkStableShortVersion = "1.17" StateFunStableVersion = "3.2.0" StateFunStableShortVersion = "3.2" - FlinkMLStableVersion = "2.1.0" - FlinkMLStableShortVersion = "2.1" + FlinkMLStableVersion = "2.2.0" + FlinkMLStableShortVersion = "2.2" FlinkKubernetesOperatorStableVersion = "1.4.0" FlinkKubernetesOperatorStableShortVersion = "1.4" FlinkTableStoreStableVersion = "0.3.0" diff --git a/docs/content/posts/2023-04-19-release-ml-2.2.0.md b/docs/content/posts/2023-04-19-release-ml-2.2.0.md new file mode 100644 index 000000000..a1c97e03b --- /dev/null +++ b/docs/content/posts/2023-04-19-release-ml-2.2.0.md @@ -0,0 +1,120 @@ +--- +authors: +- lindong: null + name: Dong Lin +date: "2023-04-19T08:00:00Z" +excerpt: The Apache Flink community is excited to announce the release of Flink + ML 2.2.0! This release focuses on enriching Flink ML's feature engineering + algorithms. The library now includes 33 feature engineering algorithms, making + it a more comprehensive library for feature engineering tasks. +title: Apache Flink ML 2.2.0 Release Announcement +aliases: +- /news/2023/04/19/release-ml-2.2.0.html +--- + +The Apache Flink community is excited to announce the release of Flink ML 2.2.0! +This release focuses on enriching Flink ML's feature engineering algorithms. The +library now includes 33 feature engineering algorithms, making it a more +comprehensive library for feature engineering tasks. + +With the addition of these algorithms, we believe Flink ML library is ready for +use in production jobs that require feature engineering capabilities, whose +input can then be consumed by both offline and online machine learning tasks. + +We encourage you to [download the +release](https://flink.apache.org/downloads.html) and share your feedback with +the community through the Flink [mailing +lists](https://flink.apache.org/community.html#mailing-lists) or +[JIRA](https://issues.apache.org/jira/browse/flink)! We hope you like the new +release and we’d be eager to learn about your experience with it. + +# Notable Features + +## Introduced API and infrastructure for online serving + +In machine learning, one of the main goals of model training is to deploy the +trained model to perform online inference, where the model server must respond +to incoming requests with millisecond-level latency. However, prior releases of +Flink ML only supported nearline inference using the Flink runtime, which may +not meet the requirements of online inference use-cases. + +With +[FLIP-289](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240881268), +Flink ML now provides an API and infrastructure for users to load a +ModelServable from model data generated by an Estimator. This ModelServable can +be replicated across multiple model servers to process online inference requests +in parallel. As the ModelServable is effectively a UDF that does not rely on +Flink runtime, it can also be integrated as a UDF into other serving or +processing frameworks to serve the model trained by Flink ML. + +As a first step, the LogisticRegressionModelServable has been added to serve the +logistic regression model online, and more servables will be added in the +future. This new feature enables Flink ML to be used for both offline and online +machine learning tasks, making it more versatile for a wider range of use cases. + +## Added 27 feature engineering algorithms + +Flink ML 2.2.0 significantly expanded the coverage of feature engineering +algorithms, increasing the number from 6 to 33. Flink ML now covers 28 out of +the 33 feature engineering algorithms provided in Spark ML, making it a more +comprehensive library for feature engineering tasks. + +Feature engineering is a critical step in modern AI infrastructures as it can +preprocess data not only for traditional machine learning algorithms like GBT +but also for deep learning algorithms and large language models like +Transformer, which are increasingly popular. With the addition of these +algorithms, we hope Flink ML can be more useful in machine-learning tasks for +Flink users. + +All feature engineering algorithms can be easily accessed through the drop-down +list on the left side of +[this](https://nightlies.apache.org/flink/flink-ml-docs-master/docs/operators/feature/binarizer/) +Flink ML page. For each algorithm, we have provided Python and Java examples to +demonstrate how to use them. + +## Added two production-validated online learning algorithms + +Flink ML offers a significant advantage over other machine learning libraries in +terms of its ability to perform online learning using Flink's streaming runtime. +To leverage this strength, we implemented two online algorithms in Flink ML and +successfully used them in a production machine learning job at Alibaba. + +This job involves dynamically clustering similar logs and detecting errors in +the logs to help site reliability engineers. By using OnlineStandardScaler and +AgglomerativeClustering to standardize and cluster logs in real-time, the job is +able to update models more frequently with a much simpler infrastructure setup. +We presented this work at [Flink Forward Asia](https://flink-forward.org.cn/) +last year, and it will soon be integrated into the open-source project +[SREWorks](https://github.com/alibaba/SREWorks). + +With these online algorithms, Flink ML provides users with the ability to +continuously update models using new data in real-time, resulting in more +accurate and up-to-date predictions. This can be particularly useful in use +cases where data is constantly streaming in, and it's important to make quick +decisions based on the latest available information. + +# Upgrade Notes + +This release is fully backward compatible with Flink ML 2.1. Users should be +able to upgrade to Flink ML 2.2.0 without worrying about any incompatibilities +or breaking changes. + +# Release Notes and Resources + +Please take a look at the [release +notes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12351884) +for a detailed list of changes and new features. + +The binary distribution and source artifacts are now available on the updated +[Downloads page](https://flink.apache.org/downloads.html) of the Flink website, +and the most recent distribution of Flink ML Python package is available on +[PyPI](https://pypi.org/project/apache-flink-ml). + +# List of Contributors + +The Apache Flink community would like to thank each one of the contributors that +have made this release possible: + +Zhipeng Zhang, Dong Lin, Fan Hong, JiangXin, Zsombor Chikan, huangxingbo, +taosiyuan163, vacaly, weibozhao, yunfengzhou-hub + diff --git a/docs/data/flink_ml.yml b/docs/data/flink_ml.yml index c9a554599..c3f2db1a6 100644 --- a/docs/data/flink_ml.yml +++ b/docs/data/flink_ml.yml @@ -15,6 +15,13 @@ # specific language governing permissions and limitations # under the License +2.2: + name: "Apache Flink ML 2.2.0" + source_release_url: "https://www.apache.org/dyn/closer.lua/flink/flink-ml-2.2.0/flink-ml-2.2.0-src.tgz" + source_release_asc_url: "https://downloads.apache.org/flink/flink-ml-2.2.0/flink-ml-2.2.0-src.tgz.asc" + source_release_sha512_url: "https://downloads.apache.org/flink/flink-ml-2.2.0/flink-ml-2.2.0-src.tgz.sha512" + compatibility: ["1.15.*"] + 2.1: name: "Apache Flink ML 2.1.0" source_release_url: "https://www.apache.org/dyn/closer.lua/flink/flink-ml-2.1.0/flink-ml-2.1.0-src.tgz" diff --git a/docs/data/release_archive.yml b/docs/data/release_archive.yml index 658cfcb4b..408c54776 100644 --- a/docs/data/release_archive.yml +++ b/docs/data/release_archive.yml @@ -483,6 +483,9 @@ release_archive: release_date: 2017-07-27 flink_ml: + - version_short: 2.2 + version_long: 2.2.0 + release_date: 2023-04-19 - version_short: 2.1 version_long: 2.1.0 release_date: 2022-07-12
