[
https://issues.apache.org/jira/browse/SAMOA-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072168#comment-16072168
]
Maciej Grzenda commented on SAMOA-68:
-------------------------------------
Let me refer ro the comment suggesting dropping Vote class and keeping the data
contained in the objects of this class in double[] table.
Class (for classification tasks) is reported in the prediction file that we
create as a class label (not as a double) to avoid confusion. Hence, it is a
String. Moreover, Samoa internally occasionally reports shorter table of votes
than the number of classes (when remaining votes are zero). Hence, keeping
explicit named vote object with explicit class label/value of vote it refers to
seems to us to be more explicit and safer than relying on knowing that e.g.
index 5 of the table is the value of votes for third class. From a wider
perspective, now that Kafka extension is prepared, similarly to saving accuracy
of the methods, I believe both files will not be created under extremely high
load. Otherwise, in case high throughput is expected, accuracy and prediction
files should be produced in a stream manner (similarly to what e.g. Spark does)
i.e. as part* files. Hence, in the case of this code (and other border case
problems such as these) perhaps clarity to performance could be preferred. To
sum up, we suggest keeping current solution based on Vote class (and not drop
Vote class, which is what we understand has been suggested).
> Saving true and predicted labels to file
> ----------------------------------------
>
> Key: SAMOA-68
> URL: https://issues.apache.org/jira/browse/SAMOA-68
> Project: SAMOA
> Issue Type: New Feature
> Components: SAMOA-API
> Reporter: Maciej Grzenda
> Labels: features
>
> Currently PrequentialEvaluation task supports dumpFile option. With this
> option model performance can be saved to a file. However, in some cases it
> would be good to save also individual predictions made by a model. This is
> useful for model debugging and method development.
> This could be also used to visualize model output, calculate custom
> performance indicators (e.g. model accuracy for instances of a certain class
> or sharing the same feature value). Such saving of model output (if done)
> should be made for every instance. Hence, a new option making it possible to
> dump predictions to a separate file seems justified. For classification, it
> should include votes made for individual classes, if available.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)