[
https://issues.apache.org/jira/browse/FLINK-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14576767#comment-14576767
]
ASF GitHub Bot commented on FLINK-2072:
---------------------------------------
Github user tillrohrmann commented on a diff in the pull request:
https://github.com/apache/flink/pull/792#discussion_r31896308
--- Diff: docs/libs/ml/quickstart.md ---
@@ -24,4 +24,198 @@ under the License.
* This will be replaced by the TOC
{:toc}
-Coming soon.
+## Introduction
+
+FlinkML is designed to make learning from your data a straight-forward
process, abstracting away
+the complexities that usually come with having to deal with big data
learning tasks. In this
+quick-start guide we will show just how easy it is to solve a simple
supervised learning problem
+using FlinkML. But first some basics, feel free to skip the next few lines
if you're already
+familiar with Machine Learning (ML)
+
+As defined by Murphy [cite ML-APP] ML deals with detecting patterns in
data, and using those
+learned patterns to make predictions about the future. We can categorize
most ML algorithms into
+two major categories: Supervised and Unsupervised Learning.
+
+* Supervised Learning deals with learning a function (mapping) from a set
of inputs
+(predictors) to a set of outputs. The learning is done using a __training
set__ of (input,
+output) pairs that we use to approximate the mapping function. Supervised
learning problems are
+further divided into classification and regression problems. In
classification problems we try to
+predict the __class__ that an example belongs to, for example whether a
user is going to click on
+an ad or not. Regression problems are about predicting (real) numerical
values, often called the dependent
+variable, for example what the temperature will be tomorrow.
+
+* Unsupervised learning deals with discovering patterns and regularities
in the data. An example
+of this would be __clustering__, where we try to discover groupings of the
data from the
+descriptive features. Unsupervised learning can also be used for feature
selection, for example
+through [principal components
analysis](https://en.wikipedia.org/wiki/Principal_component_analysis).
+
+## Loading data
+
+For loading data to be used with FlinkML we can use the ETL capabilities
of Flink, or specialized
+functions for formatted data, such as the LibSVM format. For supervised
learning problems it is
+common to use the `LabeledVector` class to represent the `(features,
label)` examples. A `LabeledVector`
+object will have a FlinkML `Vector` member representing the features of
the example and a `Double`
+member which represents the label, which could be the class in a
classification problem, or the dependent
+variable for a regression problem.
+
+# TODO: Get dataset that has separate train and test sets
--- End diff --
Isnt' the TODO fixed?
> Add a quickstart guide for FlinkML
> ----------------------------------
>
> Key: FLINK-2072
> URL: https://issues.apache.org/jira/browse/FLINK-2072
> Project: Flink
> Issue Type: New Feature
> Components: Documentation, Machine Learning Library
> Reporter: Theodore Vasiloudis
> Assignee: Theodore Vasiloudis
> Fix For: 0.9
>
>
> We need a quickstart guide that introduces users to the core concepts of
> FlinkML to get them up and running quickly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)