Hi Yun and Zhipeng, Thanks a lot for starting the discussion. +1 for the FLINK ML 2.1.0 release. Looking forward for these ML algorithms. I plan to write a blog about PyFlink + Flink ML after the released.
Best, Xingbo Zhipeng Zhang <zhangzhipe...@gmail.com> 于2022年6月23日周四 11:15写道: > Hi devs, > > Yun and I would like to start a discussion for releasing Flink ML > <https://github.com/apache/flink-ml> 2.1.0. > > In the past few months, we focused on improving the infra (e.g. memory > management, benchmark infra, online training, python support) of Flink ML > by implementing, benchmarking, and optimizing 9 new algorithms in Flink ML. > Our results have shown that Flink ML is able to meet or exceed the > performance of selected algorithms in alternative popular ML libraries. > > Please see below for a detailed list of improvements: > > - A set of representative machine learning algorithms: > - feature engineering > - MinMaxScaler (https://issues.apache.org/jira/browse/FLINK-25552) > - StringIndexer (https://issues.apache.org/jira/browse/FLINK-25527 > ) > - VectorAssembler ( > https://issues.apache.org/jira/browse/FLINK-25616 > ) > - StandardScaler ( > https://issues.apache.org/jira/browse/FLINK-26626) > - Bucketizer (https://issues.apache.org/jira/browse/FLINK-27072) > - online learning: > - OnlineKmeans (https://issues.apache.org/jira/browse/FLINK-26313) > - OnlineLogisiticRegression ( > https://issues.apache.org/jira/browse/FLINK-27170) > - regression: > - LinearRegression ( > https://issues.apache.org/jira/browse/FLINK-27093) > - classification: > - LinearSVC (https://issues.apache.org/jira/browse/FLINK-27091) > - Evaluation: > - BinaryClassificationEvaluator ( > https://issues.apache.org/jira/browse/FLINK-27294) > - A benchmark framework for Flink ML. ( > https://issues.apache.org/jira/browse/FLINK-26443) > - A website for Flink ML users ( > https://nightlies.apache.org/flink/flink-ml-docs-stable/) > - Python support for Flink ML algorithms ( > https://issues.apache.org/jira/browse/FLINK-26268, > https://issues.apache.org/jira/browse/FLINK-26269) > - Several optimizations for FlinkML infrastructure ( > https://issues.apache.org/jira/browse/FLINK-27096, > https://issues.apache.org/jira/browse/FLINK-27877) > > With the improvements and throughput benchmarks we have made, we think it > is time to release Flink ML 2.1.0, so that interested developers in the > community can try out the new Flink ML infra to develop algorithms with > high throughput and low latency. > > If there is any concern, please let us know. > > > Best, > Yun and Zhipeng >