[
https://issues.apache.org/jira/browse/SAMOA-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072495#comment-16072495
]
ASF GitHub Bot commented on SAMOA-68:
-------------------------------------
Github user mgrzenda commented on a diff in the pull request:
https://github.com/apache/incubator-samoa/pull/61#discussion_r125298088
--- Diff: samoa-api/src/main/java/org/apache/samoa/moa/core/Vote.java ---
@@ -0,0 +1,90 @@
+package org.apache.samoa.moa.core;
+
+/*
+ * #%L
+ * SAMOA
+ * %%
+ * Copyright (C) 2014 - 2015 Apache Software Foundation
+ * %%
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ * #L%
+ */
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.samoa.moa.AbstractMOAObject;
+
+/**
--- End diff --
Class is reported in the prediction file we create as a class label (not as
a double) to avoid confusion. Hence, it is a String. Moreover, Samoa internally
occasionally reports shorter table of votes than the number of classes (when
remaining votes are zero). Hence, keeping explicit named vote object with
explicit class label/value of vote it refers to seemed to us to be safer than
relying on knowing that e.g. index 5 of the table is the value of votes for
third class. From a wider perspective, now that Kafka extension is ready,
similarly to saving accuracy of the methods, I believe both files will not be
created under extremely high load. Otherwise, in case high throughput is
expected, accuracy and prediction files should be produced in a stream manner
(similarly to what e.g. Spark does) i.e. as part* files. Hence, in the case of
this code (and other border case problems such as these) perhaps clarity to
performance could be preferred. To sum up, we suggest keeping current solution
based on Vote class (and not drop Vote class, which is what we understand you
suggested).
> Saving true and predicted labels to file
> ----------------------------------------
>
> Key: SAMOA-68
> URL: https://issues.apache.org/jira/browse/SAMOA-68
> Project: SAMOA
> Issue Type: New Feature
> Components: SAMOA-API
> Reporter: Maciej Grzenda
> Labels: features
>
> Currently PrequentialEvaluation task supports dumpFile option. With this
> option model performance can be saved to a file. However, in some cases it
> would be good to save also individual predictions made by a model. This is
> useful for model debugging and method development.
> This could be also used to visualize model output, calculate custom
> performance indicators (e.g. model accuracy for instances of a certain class
> or sharing the same feature value). Such saving of model output (if done)
> should be made for every instance. Hence, a new option making it possible to
> dump predictions to a separate file seems justified. For classification, it
> should include votes made for individual classes, if available.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)