[
https://issues.apache.org/jira/browse/SAMOA-58?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193383#comment-15193383
]
ASF GitHub Bot commented on SAMOA-58:
-------------------------------------
Github user edi-bice commented on a diff in the pull request:
https://github.com/apache/incubator-samoa/pull/48#discussion_r56010998
--- Diff:
samoa-api/src/main/java/org/apache/samoa/streams/ArffFileStream.java ---
@@ -57,32 +60,39 @@ public void prepareForUseImpl(TaskMonitor monitor,
ObjectRepository repository)
@Override
protected void reset() {
try {
- if (this.fileReader != null)
- this.fileReader.close();
-
fileSource.reset();
} catch (IOException ioe) {
throw new RuntimeException("FileStream restart failed.", ioe);
}
- if (!getNextFileReader()) {
+ if (!getNextFileStream()) {
hitEndOfStream = true;
throw new RuntimeException("FileStream is empty.");
}
}
@Override
- protected boolean getNextFileReader() {
- boolean ret = super.getNextFileReader();
- if (ret) {
- this.instances = new Instances(this.fileReader, 1, -1);
- if (this.classIndexOption.getValue() < 0) {
- this.instances.setClassIndex(this.instances.numAttributes() - 1);
- } else if (this.classIndexOption.getValue() > 0) {
- this.instances.setClassIndex(this.classIndexOption.getValue() - 1);
+ protected boolean getNextFileStream() {
--- End diff --
Good point. I'm not sure if there are tests that cover both scenarios. But
at least it seems from code that the file reader/stream is closed (and set to
null) upon failure to read an instance (end of file) and is also checked and
closed upon call to getNextFileStream.
> Samoa AvroFileStream from HDFSFileStreamSource stops at end of first file
> -------------------------------------------------------------------------
>
> Key: SAMOA-58
> URL: https://issues.apache.org/jira/browse/SAMOA-58
> Project: SAMOA
> Issue Type: Bug
> Components: SAMOA-Instances
> Environment: RHEL 6.6, java 1.8.0_72
> Reporter: Edi Bice
>
> It appears Samoa is capable of streaming a collection of files as a single
> stream effectively concatenating the files. However using Samoa
> AvroFileStream from HDFSFileStreamSource seems the stream stops at end of
> first file:
> bin/samoa local target/SAMOA-Local-0.4.0-incubating-SNAPSHOT.jar
> "PrequentialEvaluation -i -1 -l (classifiers.ensemble.Bagging -s 100) -s
> (AvroFileStream -s HDFSFileStreamSource -f
> /tmp/order_and_feats_flat_avro/2016_02_18/ -c 1 -e binary) -f 10000"
> 2016-02-18 20:43:20,991 [main] INFO
> org.apache.samoa.evaluation.EvaluatorProcessor (EvaluatorProcessor.java:183)
> - last event is received!
> 2016-02-18 20:43:20,991 [main] INFO
> org.apache.samoa.evaluation.EvaluatorProcessor (EvaluatorProcessor.java:184)
> - total count: 262144
> ...
> 2016-02-18 20:43:20,993 [main] INFO
> org.apache.samoa.evaluation.EvaluatorProcessor (EvaluatorProcessor.java:191)
> - total evaluation time: 34 seconds for 262144 instances
> bash-4.1$ hadoop fs -ls /tmp/order_and_feats_flat_avro/2016_02_18 | more
> Found 70 items
> -rw-r--r-- 3 yarn hdfs 230855335 2016-02-18 16:01
> /tmp/order_and_feats_flat_avro/2016_02_18/hdfs-1a238673-c4ec-4462-be67-78d573efa790-00001
> -rw-r--r-- 3 yarn hdfs 229800273 2016-02-18 16:04
> /tmp/order_and_feats_flat_avro/2016_02_18/hdfs-1a238673-c4ec-4462-be67-78d573efa790-00002
> ...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)