Hi devs,

Yun and I would like to start a discussion for releasing Flink ML
<https://github.com/apache/flink-ml> 2.1.0.

In the past few months, we focused on improving the infra (e.g. memory
management, benchmark infra, online training, python support) of Flink ML
by implementing, benchmarking, and optimizing 9 new algorithms in Flink ML.
Our results have shown that Flink ML is able to meet or exceed the
performance of selected algorithms in alternative popular ML libraries.

Please see below for a detailed list of improvements:

- A set of representative machine learning algorithms:
    - feature engineering
        - MinMaxScaler (https://issues.apache.org/jira/browse/FLINK-25552)
        - StringIndexer (https://issues.apache.org/jira/browse/FLINK-25527)
        - VectorAssembler (https://issues.apache.org/jira/browse/FLINK-25616
)
        - StandardScaler (https://issues.apache.org/jira/browse/FLINK-26626)
        - Bucketizer (https://issues.apache.org/jira/browse/FLINK-27072)
    - online learning:
        - OnlineKmeans (https://issues.apache.org/jira/browse/FLINK-26313)
        - OnlineLogisiticRegression (
https://issues.apache.org/jira/browse/FLINK-27170)
    - regression:
        - LinearRegression (
https://issues.apache.org/jira/browse/FLINK-27093)
    - classification:
        - LinearSVC (https://issues.apache.org/jira/browse/FLINK-27091)
    - Evaluation:
        - BinaryClassificationEvaluator (
https://issues.apache.org/jira/browse/FLINK-27294)
- A benchmark framework for Flink ML. (
https://issues.apache.org/jira/browse/FLINK-26443)
- A website for Flink ML users (
https://nightlies.apache.org/flink/flink-ml-docs-stable/)
- Python support for Flink ML algorithms (
https://issues.apache.org/jira/browse/FLINK-26268,
https://issues.apache.org/jira/browse/FLINK-26269)
- Several optimizations for FlinkML infrastructure (
https://issues.apache.org/jira/browse/FLINK-27096,
https://issues.apache.org/jira/browse/FLINK-27877)

With the improvements and throughput benchmarks we have made, we think it
is time to release Flink ML 2.1.0, so that interested developers in the
community can try out the new Flink ML infra to develop algorithms with
high throughput and low latency.

If there is any concern, please let us know.


Best,
Yun and Zhipeng

Reply via email to