+1. It looks like we have some decent progress on Flink ML :)
Thanks, Jiangjie (Becket) Qin On Fri, Jun 24, 2022 at 8:29 AM Dong Lin <lindon...@gmail.com> wrote: > Hi Zhipeng and Yun, > > Thanks for starting the discussion. +1 for the Flink ML 2.1.0 release. > > Cheers, > Dong > > On Thu, Jun 23, 2022 at 11:15 AM Zhipeng Zhang <zhangzhipe...@gmail.com> > wrote: > > > Hi devs, > > > > Yun and I would like to start a discussion for releasing Flink ML > > <https://github.com/apache/flink-ml> 2.1.0. > > > > In the past few months, we focused on improving the infra (e.g. memory > > management, benchmark infra, online training, python support) of Flink ML > > by implementing, benchmarking, and optimizing 9 new algorithms in Flink > ML. > > Our results have shown that Flink ML is able to meet or exceed the > > performance of selected algorithms in alternative popular ML libraries. > > > > Please see below for a detailed list of improvements: > > > > - A set of representative machine learning algorithms: > > - feature engineering > > - MinMaxScaler ( > https://issues.apache.org/jira/browse/FLINK-25552) > > - StringIndexer ( > https://issues.apache.org/jira/browse/FLINK-25527 > > ) > > - VectorAssembler ( > > https://issues.apache.org/jira/browse/FLINK-25616 > > ) > > - StandardScaler ( > > https://issues.apache.org/jira/browse/FLINK-26626) > > - Bucketizer (https://issues.apache.org/jira/browse/FLINK-27072) > > - online learning: > > - OnlineKmeans ( > https://issues.apache.org/jira/browse/FLINK-26313) > > - OnlineLogisiticRegression ( > > https://issues.apache.org/jira/browse/FLINK-27170) > > - regression: > > - LinearRegression ( > > https://issues.apache.org/jira/browse/FLINK-27093) > > - classification: > > - LinearSVC (https://issues.apache.org/jira/browse/FLINK-27091) > > - Evaluation: > > - BinaryClassificationEvaluator ( > > https://issues.apache.org/jira/browse/FLINK-27294) > > - A benchmark framework for Flink ML. ( > > https://issues.apache.org/jira/browse/FLINK-26443) > > - A website for Flink ML users ( > > https://nightlies.apache.org/flink/flink-ml-docs-stable/) > > - Python support for Flink ML algorithms ( > > https://issues.apache.org/jira/browse/FLINK-26268, > > https://issues.apache.org/jira/browse/FLINK-26269) > > - Several optimizations for FlinkML infrastructure ( > > https://issues.apache.org/jira/browse/FLINK-27096, > > https://issues.apache.org/jira/browse/FLINK-27877) > > > > With the improvements and throughput benchmarks we have made, we think it > > is time to release Flink ML 2.1.0, so that interested developers in the > > community can try out the new Flink ML infra to develop algorithms with > > high throughput and low latency. > > > > If there is any concern, please let us know. > > > > > > Best, > > Yun and Zhipeng > > >