Repository: flink Updated Branches: refs/heads/master a137321ac -> eb23f8074
[FLINK-2272] [ml] Removed roadmap and vision from docs, added link to them in the wiki. This closes #864. Project: http://git-wip-us.apache.org/repos/asf/flink/repo Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/4cc7cf35 Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/4cc7cf35 Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/4cc7cf35 Branch: refs/heads/master Commit: 4cc7cf35e156ede9b2c146ce61b6d931bfe921d7 Parents: a137321 Author: Theodore Vasiloudis <t...@sics.se> Authored: Wed Jun 24 13:55:38 2015 +0200 Committer: Till Rohrmann <trohrm...@apache.org> Committed: Thu Jul 2 14:31:58 2015 +0200 ---------------------------------------------------------------------- docs/libs/ml/index.md | 2 +- docs/libs/ml/vision_roadmap.md | 99 ------------------------------------- 2 files changed, 1 insertion(+), 100 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/flink/blob/4cc7cf35/docs/libs/ml/index.md ---------------------------------------------------------------------- diff --git a/docs/libs/ml/index.md b/docs/libs/ml/index.md index e81b354..63cdf43 100644 --- a/docs/libs/ml/index.md +++ b/docs/libs/ml/index.md @@ -24,7 +24,7 @@ FlinkML is the Machine Learning (ML) library for Flink. It is a new effort in th with a growing list of algorithms and contributors. With FlinkML we aim to provide scalable ML algorithms, an intuitive API, and tools that help minimize glue code in end-to-end ML systems. You can see more details about our goals and where the library is headed in our [vision -and roadmap here](vision_roadmap.html). +and roadmap here](https://cwiki.apache.org/confluence/display/FLINK/FlinkML%3A+Vision+and+Roadmap). * This will be replaced by the TOC {:toc} http://git-wip-us.apache.org/repos/asf/flink/blob/4cc7cf35/docs/libs/ml/vision_roadmap.md ---------------------------------------------------------------------- diff --git a/docs/libs/ml/vision_roadmap.md b/docs/libs/ml/vision_roadmap.md deleted file mode 100644 index 24b651e..0000000 --- a/docs/libs/ml/vision_roadmap.md +++ /dev/null @@ -1,99 +0,0 @@ ---- -htmlTitle: FlinkML - Vision and Roadmap -title: <a href="../ml">FlinkML</a> - Vision and Roadmap ---- -<!-- -Licensed to the Apache Software Foundation (ASF) under one -or more contributor license agreements. See the NOTICE file -distributed with this work for additional information -regarding copyright ownership. The ASF licenses this file -to you under the Apache License, Version 2.0 (the -"License"); you may not use this file except in compliance -with the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, -software distributed under the License is distributed on an -"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -KIND, either express or implied. See the License for the -specific language governing permissions and limitations -under the License. ---> - -* This will be replaced by the TOC -{:toc} - -## Vision - -The Machine Learning (ML) library for Flink is a new effort to bring scalable ML tools to the Flink -community. Our goal is is to design and implement a system that is scalable and can deal with -problems of various sizes, whether your data size is measured in megabytes or terabytes and beyond. -We call this library FlinkML. - -An important concern for developers of ML systems is the amount of glue code that developers are -forced to write [1] in the process of implementing an end-to-end ML system. Our goal with FlinkML -is to help developers keep glue code to a minimum. The Flink ecosystem provides a great setting to -tackle this problem, with its scalable ETL capabilities that can be easily combined inside the same -program with FlinkML, allowing the development of robust pipelines without the need to use yet -another technology for data ingestion and data munging. - -Another goal for FlinkML is to make the library easy to use. To that end we will be providing -detailed documentation along with examples for every part of the system. Our aim is that developers -will be able to get started with writing their ML pipelines quickly, using familiar programming -concepts and terminology. - -Contrary to other data-processing systems, Flink exploits in-memory data streaming, and natively -executes iterative processing algorithms which are common in ML. We plan to exploit the streaming -nature of Flink, and provide functionality designed specifically for data streams. - -FlinkML will allow data scientists to test their models locally and using subsets of data, and then -use the same code to run their algorithms at a much larger scale in a cluster setting. - -We are inspired by other open source efforts to provide ML systems, in particular -[scikit-learn](http://scikit-learn.org/) for cleanly specifying ML pipelines, and Sparkâs -[MLLib](https://spark.apache.org/mllib/) for providing ML algorithms that scale with problem and -cluster sizes. - -## Roadmap - -The roadmap below can provide an indication of the algorithms we aim to implement in the coming -months. If you are interested in helping out, please check our [contribution guide](contribution_guide.html). -Items in **bold** have already been implemented: - -* Pipelines of transformers and learners -* Data pre-processing - * **Feature scaling** - * **Polynomial feature base mapper** - * Feature hashing - * Feature extraction for text - * Dimensionality reduction -* Model selection and performance evaluation - * Cross-validation for model selection and evaluation -* Supervised learning - * Optimization framework - * **Stochastic Gradient Descent** - * L-BFGS - * Generalized Linear Models - * **Multiple linear regression** - * LASSO, Ridge regression - * Multi-class Logistic regression - * Random forests - * **Support Vector Machines** -* Unsupervised learning - * Clustering - * K-means clustering - * PCA -* Recommendation - * **ALS** -* Text analytics - * LDA -* Statistical estimation tools -* Distributed linear algebra -* Streaming ML - -**References:** - -[1] D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, -and M. Young. _Machine learning: The high interest credit card of technical debt._ In SE4ML: -Software Engineering for Machine Learning (NIPS 2014 Workshop), 2014.