[ 
https://issues.apache.org/jira/browse/SAMOA-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072168#comment-16072168
 ] 

Maciej Grzenda commented on SAMOA-68:
-------------------------------------

Let me refer ro the comment suggesting dropping Vote class and keeping the data 
contained in the objects of this class in double[] table.

Class (for classification tasks) is reported in the prediction file that we 
create as a class label (not as a double) to avoid confusion. Hence, it is a 
String. Moreover, Samoa internally occasionally reports shorter table of votes 
than the number of classes (when remaining votes are zero). Hence, keeping 
explicit named vote object with explicit class label/value of vote it refers to 
seems to us to be more explicit and safer than relying on knowing that e.g. 
index 5 of the table is the value of votes for third class. From a wider 
perspective, now that Kafka extension is prepared, similarly to saving accuracy 
of the methods, I believe both files will not be created under extremely high 
load. Otherwise, in case high throughput is expected, accuracy and prediction 
files should be produced in a stream manner (similarly to what e.g. Spark does) 
i.e. as part* files. Hence, in the case of this code (and other border case 
problems such as these) perhaps clarity to performance could be preferred. To 
sum up, we  suggest keeping current solution based on Vote class (and not drop 
Vote class, which is what we understand has been suggested). 

> Saving true and predicted labels to file
> ----------------------------------------
>
>                 Key: SAMOA-68
>                 URL: https://issues.apache.org/jira/browse/SAMOA-68
>             Project: SAMOA
>          Issue Type: New Feature
>          Components: SAMOA-API
>            Reporter: Maciej Grzenda
>              Labels: features
>
> Currently PrequentialEvaluation task supports dumpFile option.  With this 
> option model performance can be saved to a file. However, in some cases it 
> would be good to save also individual predictions made by a model.  This is 
> useful for model debugging and method development.
> This could be also used to visualize model output, calculate custom 
> performance indicators (e.g. model accuracy for instances of a certain class 
> or sharing the same feature value).  Such saving of model output (if done) 
> should be made for every instance. Hence, a new option making it possible to 
> dump predictions to a separate file seems justified.  For classification, it 
> should include votes made for individual classes, if available.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to