[
https://issues.apache.org/jira/browse/FLINK-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609840#comment-14609840
]
ASF GitHub Bot commented on FLINK-2297:
---------------------------------------
Github user thvasilo commented on a diff in the pull request:
https://github.com/apache/flink/pull/874#discussion_r33663520
--- Diff:
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/classification/SVM.scala
---
@@ -242,8 +275,21 @@ object SVM{
}
}
- override def predict(value: T, model: DenseVector): Double = {
- value.asBreeze dot model.asBreeze
+ override def predict(value: T, model: DenseVector,
predictParameters: ParameterMap):
+ Double = {
+ val thresholdOption = predictParameters.get(Threshold)
+
+ val rawValue = value.asBreeze dot model.asBreeze
+ // If the Threshold option has been reset, we will get back a
Some(None) thresholdOption
+ // causing the exception when we try to get the value. In that
case we just return the
+ // raw value
+ try {
+ val thresOptionValue = thresholdOption.get
+ if (rawValue > thresOptionValue) 1.0 else -1.0
+ }
+ catch {
+ case e: java.lang.ClassCastException => rawValue
+ }
--- End diff --
This relates to the previous discussion:
I do believe we want this turned on by default, when you train a binary
classifier you expect that `predict` will return binary labels, not the
decision function values.
So if we have `None` as default, the user could write:
```scala
val svm = SVM().
setBlocks(env.getParallelism)
svm.fit(train)
val eval = svm.evaluate(test)
```
and the eval output would not make sense, but if he wrote
```scala
val svm = SVM().
setBlocks(env.getParallelism).
setThreshold(0.0)
svm.fit(train)
val eval = svm.evaluate(test)
```
it would.
> Add threshold setting for SVM binary predictions
> ------------------------------------------------
>
> Key: FLINK-2297
> URL: https://issues.apache.org/jira/browse/FLINK-2297
> Project: Flink
> Issue Type: Improvement
> Components: Machine Learning Library
> Reporter: Theodore Vasiloudis
> Assignee: Theodore Vasiloudis
> Priority: Minor
> Labels: ML
> Fix For: 0.10
>
>
> Currently SVM outputs the raw decision function values when using the predict
> function.
> We should have instead the ability to set a threshold above which examples
> are labeled as positive (1.0) and below negative (-1.0). Then the prediction
> function can be directly used for evaluation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)