azagrebin commented on a change in pull request #6425: [FLINK-9664][Doc] fixing 
documentation in ML quick start
URL: https://github.com/apache/flink/pull/6425#discussion_r209894118
 
 

 ##########
 File path: docs/dev/libs/ml/quickstart.md
 ##########
 @@ -146,7 +145,23 @@ create a classifier.
 
 ## Classification
 
-Once we have imported the dataset we can train a `Predictor` such as a linear 
SVM classifier.
+After importing the training and test dataset, they need to be prepared for 
the classification. 
+Because Flink SVM only supports threshold binary values of `+1.0` and `-1.0`, 
a conversion is 
+needed after loading the LibSVM dataset since it is labelled using `1`s and 
`0`s.
+
+A conversion can be done using a simple normalizer mapping function:
+ 
+{% highlight scala %}
+
+def normalizer : LabeledVector => LabeledVector = { 
+    lv => LabeledVector(if (lv.label > 0.0) 1.0 else -1.0, lv.vector)
+}
+val astroTrain: DataSet[LabeledVector] = astroTrainLibSVM.map(normalizer)
+val astroTest: DataSet[(Vector, Double)] = 
astroTestLibSVM.map(normalizer).map(x => (x.vector, x.label))
+
+{% endhighlight %}
+
+Once we have the converted the dataset we can train a `Predictor` such as a 
linear SVM classifier.
 
 Review comment:
   One superfluous `the`:
   Once we have the converted **the** dataset

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to