[jira] [Commented] (SAMOA-68) Saving true and predicted labels to file

ASF GitHub Bot (JIRA) Mon, 03 Jul 2017 07:11:24 -0700

    [ 
https://issues.apache.org/jira/browse/SAMOA-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072495#comment-16072495
 ]


ASF GitHub Bot commented on SAMOA-68:
-------------------------------------

Github user mgrzenda commented on a diff in the pull request:

    https://github.com/apache/incubator-samoa/pull/61#discussion_r125298088
  
    --- Diff: samoa-api/src/main/java/org/apache/samoa/moa/core/Vote.java ---
    @@ -0,0 +1,90 @@
    +package org.apache.samoa.moa.core;
    +
    +/*
    + * #%L
    + * SAMOA
    + * %%
    + * Copyright (C) 2014 - 2015 Apache Software Foundation
    + * %%
    + * Licensed under the Apache License, Version 2.0 (the "License");
    + * you may not use this file except in compliance with the License.
    + * You may obtain a copy of the License at
    + * 
    + *      http://www.apache.org/licenses/LICENSE-2.0
    + * 
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + * #L%
    + */
    +
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.samoa.moa.AbstractMOAObject;
    +
    +/**
    --- End diff --
    
    Class is reported in the prediction file we create as a class label (not as 
a double) to avoid confusion. Hence, it is a String. Moreover, Samoa internally 
occasionally reports shorter table of votes than the number of classes (when 
remaining votes are zero). Hence, keeping explicit named vote object with 
explicit class label/value of vote it refers to seemed to us to be safer than 
relying on knowing that e.g. index 5 of the table is the value of votes for 
third class. From a wider perspective, now that Kafka extension is ready, 
similarly to saving accuracy of the methods, I believe both files will not be 
created under extremely high load. Otherwise, in case high throughput is 
expected, accuracy and prediction files should be produced in a stream manner 
(similarly to what e.g. Spark does) i.e. as part* files. Hence, in the case of 
this code (and other border case problems such as these) perhaps clarity to 
performance could be preferred.  To sum up, we suggest keeping current solution 
based on Vote class (and not drop Vote class, which is what we understand you 
suggested). 


> Saving true and predicted labels to file
> ----------------------------------------
>
>                 Key: SAMOA-68
>                 URL: https://issues.apache.org/jira/browse/SAMOA-68
>             Project: SAMOA
>          Issue Type: New Feature
>          Components: SAMOA-API
>            Reporter: Maciej Grzenda
>              Labels: features
>
> Currently PrequentialEvaluation task supports dumpFile option.  With this 
> option model performance can be saved to a file. However, in some cases it 
> would be good to save also individual predictions made by a model.  This is 
> useful for model debugging and method development.
> This could be also used to visualize model output, calculate custom 
> performance indicators (e.g. model accuracy for instances of a certain class 
> or sharing the same feature value).  Such saving of model output (if done) 
> should be made for every instance. Hence, a new option making it possible to 
> dump predictions to a separate file seems justified.  For classification, it 
> should include votes made for individual classes, if available.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (SAMOA-68) Saving true and predicted labels to file

Reply via email to