Hi devs, Yun and I would like to start a discussion for releasing Flink ML <https://github.com/apache/flink-ml> 2.1.0.
In the past few months, we focused on improving the infra (e.g. memory management, benchmark infra, online training, python support) of Flink ML by implementing, benchmarking, and optimizing 9 new algorithms in Flink ML. Our results have shown that Flink ML is able to meet or exceed the performance of selected algorithms in alternative popular ML libraries. Please see below for a detailed list of improvements: - A set of representative machine learning algorithms: - feature engineering - MinMaxScaler (https://issues.apache.org/jira/browse/FLINK-25552) - StringIndexer (https://issues.apache.org/jira/browse/FLINK-25527) - VectorAssembler (https://issues.apache.org/jira/browse/FLINK-25616 ) - StandardScaler (https://issues.apache.org/jira/browse/FLINK-26626) - Bucketizer (https://issues.apache.org/jira/browse/FLINK-27072) - online learning: - OnlineKmeans (https://issues.apache.org/jira/browse/FLINK-26313) - OnlineLogisiticRegression ( https://issues.apache.org/jira/browse/FLINK-27170) - regression: - LinearRegression ( https://issues.apache.org/jira/browse/FLINK-27093) - classification: - LinearSVC (https://issues.apache.org/jira/browse/FLINK-27091) - Evaluation: - BinaryClassificationEvaluator ( https://issues.apache.org/jira/browse/FLINK-27294) - A benchmark framework for Flink ML. ( https://issues.apache.org/jira/browse/FLINK-26443) - A website for Flink ML users ( https://nightlies.apache.org/flink/flink-ml-docs-stable/) - Python support for Flink ML algorithms ( https://issues.apache.org/jira/browse/FLINK-26268, https://issues.apache.org/jira/browse/FLINK-26269) - Several optimizations for FlinkML infrastructure ( https://issues.apache.org/jira/browse/FLINK-27096, https://issues.apache.org/jira/browse/FLINK-27877) With the improvements and throughput benchmarks we have made, we think it is time to release Flink ML 2.1.0, so that interested developers in the community can try out the new Flink ML infra to develop algorithms with high throughput and low latency. If there is any concern, please let us know. Best, Yun and Zhipeng