[
https://issues.apache.org/jira/browse/FLINK-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16574442#comment-16574442
]
ASF GitHub Bot commented on FLINK-9664:
---------------------------------------
azagrebin commented on a change in pull request #6425: [FLINK-9664][Doc] fixing
documentation in ML quick start
URL: https://github.com/apache/flink/pull/6425#discussion_r208837648
##########
File path: docs/dev/libs/ml/quickstart.md
##########
@@ -129,6 +129,10 @@ and the [test set
here](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/b
This is an astroparticle binary classification dataset, used by Hsu et al.
[[3]](#hsu) in their
practical Support Vector Machine (SVM) guide. It contains 4 numerical
features, and the class label.
+Before importing the traning and test dataset, Flink SVM only supports
threshold binary values of
+`+1.0` and `-1.0`. Thus a conversion is needed upon downloading the svmguide1
dataset since it is
+labelled using `1`s and `0`s.
+
Review comment:
By sections I mean `LibSVM files` and `Classification` parts of
`quickstart.md`.
I think your explanation of why we need the conversion was good, expanded
enough, I just suggested to rephrase its start a bit to be moved to the
beginning of `Classification` section. The example of conversion can follow
your explanation. The overall structure I suggest:
*LibSVM files*
...Text as before...
```
((( leave only lib SVM importing specifics in this example: )))
val astroTrainLibSVM: DataSet[LabeledVector] = MLUtils.readLibSVM(env,
"/path/to/svmguide1")
val astroTestLibSVM: DataSet[LabeledVector] = MLUtils.readLibSVM(env,
"/path/to/svmguide1.t")
```
...Text as before..
*Classification*
...Explanation of conversion need before classification...:
```
// conversion code example, e.g. which I suggested
```
...section continues as it was with classification description and its
example..
The idea is that at the end user can just copy/paste code snippets starting
from the import code, then conversion/normalisation, then classification etc
and it eventually works altogether.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> FlinkML Quickstart Loading Data section example doesn't work as described
> -------------------------------------------------------------------------
>
> Key: FLINK-9664
> URL: https://issues.apache.org/jira/browse/FLINK-9664
> Project: Flink
> Issue Type: Bug
> Components: Documentation, Machine Learning Library
> Affects Versions: 1.5.0
> Reporter: Mano Swerts
> Assignee: Rong Rong
> Priority: Major
> Labels: documentation-update, machine_learning, ml,
> pull-request-available
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> The ML documentation example isn't complete:
> [https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/libs/ml/quickstart.html#loading-data]
> The referred section loads data from an astroparticle binary classification
> dataset to showcase SVM. The dataset uses 0 and 1 as labels, which doesn't
> produce correct results. The SVM predictor expects -1 and 1 labels to
> correctly predict the label. The documentation, however, doesn't mention
> that. The example therefore doesn't work without a clue why.
> The documentation should be updated with an explicit mention to -1 and 1
> labels and a mapping function that shows the conversion of the labels.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)