azagrebin commented on a change in pull request #6425: [FLINK-9664][Doc] fixing 
documentation in ML quick start
URL: https://github.com/apache/flink/pull/6425#discussion_r208576999
 
 

 ##########
 File path: docs/dev/libs/ml/quickstart.md
 ##########
 @@ -129,6 +129,10 @@ and the [test set 
here](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/b
 This is an astroparticle binary classification dataset, used by Hsu et al. 
[[3]](#hsu) in their
 practical Support Vector Machine (SVM) guide. It contains 4 numerical 
features, and the class label.
 
+Before importing the traning and test dataset, Flink SVM only supports 
threshold binary values of 
+`+1.0` and `-1.0`. Thus a conversion is needed upon downloading the svmguide1 
dataset since it is 
+labelled using `1`s and `0`s.
+
 
 Review comment:
   I think this section belongs to the beginning of the next one 
`Classification`, because it is about LibSVM format.
   The code example of conversion could be also provided to make the example 
fully 'copy-paste' runnable.
   Small thing is also typo in `traning` -> `training`.
   
   I would suggest to modify the code example in this `LibSVM files` section 
like this:
   ```
   val astroTrainLibSVM: DataSet[LabeledVector] = MLUtils.readLibSVM(env, 
"/path/to/svmguide1")
   val astroTestLibSVM: DataSet[LabeledVector] = MLUtils.readLibSVM(env, 
"/path/to/svmguide1.t")
   ```
   to have no SVM training specifics here, and add something like this to the 
beginning of `Classification` section:
   
   _... After importing the training and test dataset, the data needs to be 
prepared for the classification, because Flink SVM only supports ... conversion 
is needed after downloading ..._
   And then the code example:
   ```
   def svmNormaliser : LabeledVector => LabeledVector =
       lv => LabeledVector(if (lv.label > 0.0) 1.0 else -1.0, lv.vector)
   val astroTrain: DataSet[LabeledVector] = astroTrainLibSVM.map(svmNormaliser)
   val astroTest: DataSet[(Vector, Double)] = 
astroTestLibSVM.map(svmNormaliser).map(x => (x.vector, x.label))
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to