Hi, Thank you very much for your quick response.
We were using an older version of SAMOA. I've updated the code now (The last commit is currently "SAMOA-29: Excluding the samoa-storm.properties at compile time and including at test") and after building the code with "mvn package" the new command we use to run SAMOA is local target/SAMOA-Local-0.4.0-incubating-SNAPSHOT.jar "PrequentialEvaluation -i -1 -f 41920 -l (org.apache.samoa.learners.classifiers.trees.VerticalHoeffdingTree -p 4) -s (org.apache.samoa.moa.streams.ArffFileStream -f kddcup_full.arff)" The console output when the command is run: bin/samoa Deploying to LOCAL Command line string = PrequentialEvaluation -i -1 -f 41920 -l (org.apache.samoa.learners.classifiers.trees.VerticalHoeffdingTree -p 4) -s (org.apache.samoa.moa.streams.ArffFileStream -f kddcup_full.arff) 2015-09-09 15:56:30,036 [main] INFO org.apache.samoa.LocalDoTask (LocalDoTask.java:80) - Successfully instantiating org.apache.samoa.tasks.PrequentialEvaluation 2015-09-09 15:56:31,221 [main] INFO org.apache.samoa.evaluation.EvaluatorProcessor (EvaluatorProcessor.java:83) - 1 seconds for 41920 instances 2015-09-09 15:56:31,227 [main] INFO org.apache.samoa.evaluation.EvaluatorProcessor (EvaluatorProcessor.java:169) - evaluation instances = 41,920 classified instances = 41,920 classifications correct (percent) = 99.988 Kappa Statistic (percent) = -0.002 Kappa Temporal Statistic (percent) = 28.571 Exception in thread "main" java.lang.NullPointerException at org.apache.samoa.learners.classifiers.trees.ModelAggregatorProcessor.process(ModelAggregatorProcessor.java:142) at org.apache.samoa.topology.impl.SimpleProcessingItem.processEvent(SimpleProcessingItem.java:84) at org.apache.samoa.topology.impl.SimpleStream.put(SimpleStream.java:72) at org.apache.samoa.topology.impl.SimpleStream.put(SimpleStream.java:61) at org.apache.samoa.learners.classifiers.trees.FilterProcessor.process(FilterProcessor.java:93) at org.apache.samoa.topology.impl.SimpleProcessingItem.processEvent(SimpleProcessingItem.java:84) at org.apache.samoa.topology.impl.SimpleStream.put(SimpleStream.java:72) at org.apache.samoa.topology.impl.SimpleStream.put(SimpleStream.java:61) at org.apache.samoa.topology.LocalEntranceProcessingItem.injectNextEvent(LocalEntranceProcessingItem.java:45) at org.apache.samoa.topology.LocalEntranceProcessingItem.startSendingEvents(LocalEntranceProcessingItem.java:63) at org.apache.samoa.topology.impl.SimpleTopology.run(SimpleTopology.java:44) at org.apache.samoa.topology.impl.SimpleEngine.submitTopology(SimpleEngine.java:33) at org.apache.samoa.LocalDoTask.main(LocalDoTask.java:88) We would be very appreciated if you could send us the link for the ticket so we can follow the updates on the issue. Yes, we would like to dump the model so that we can see the rules of the model and have a better understanding of it. The method body of describeSubtree() in Node.java is currently empty. Is there any work done on it that we can use as a starting point? If you need the data set to investigate the issue, I can send it via any suitable channel, please let me know. Respectfully, Ercan Ozturk 2015-09-09 15:11 GMT+03:00 Gianmarco De Francisci Morales <[email protected]>: > Hi, > > Thanks for reporting the bug. > I'm not sure what is causing the issue. > Are you using the master version of SAMOA? > My line 145 of ModelAggregator is: > this.sendToAttributeStream(abce[i]); > > From what you say it seems that the problem is a bit above, and leafNode > is null. > However, by construction there should always be a leaf node. > > As a workaround your solution is fine, but I guess there is some other > underlying problem with the code, which might cause some loss in accuracy. > We should investigate this issue further, I'll open a ticket. > > Regarding fetching the content of the model, we had some prototype model > dumper code (Arinto had started it), but I guess it's not working anymore. > See the describeSubtree() method in Node.java. > So unfortunately you need to do it yourself. However, the good thing is > that the tree model is in a single place in ModelAggregator, so it should > be relatively easy to walk the tree, starting from the root node. > Do you want to dump the model to a text representation for human > inspection? > > Cheers, > > > -- > Gianmarco > > On 7 September 2015 at 18:23, Gianmarco De Francisci Morales < > [email protected]> wrote: > >> Forwarding to the @dev list. >> -- >> Gianmarco >> >> ---------- Forwarded message ---------- >> From: Ercan Öztürk <[email protected]> >> Date: 7 September 2015 at 16:57 >> Subject: HoeffdingTree and VerticalHoeffdingTree Classifiers run on KDD >> Cup 99 Data Set >> To: [email protected] >> >> >> Hi Mr. Morales and Mr. Bifet, >> >> We are a couple of undergrad students from TOBB University. As a data >> mining class project, we decided to run HoeffdingTree classifier-in moa and >> VerticalHoeffdingTree classifier-in samoa on KDD Cup 99 data set (couldn't >> attach the data set to this mail due to the size limitations of the Apache >> mail server) and present the results in our project report. >> >> We were able to run HoeffdingTree Algorithm on the KDD Cup 99 (both on >> kddcup_full.arff, kddcup_10_percent.arff) data set. >> VerticalHoeffdingTree classifier also works fine on >> kddcup_10_percent.arff. However, when we try to run the >> VerticalHoeffdingTree classifier on kddcup_full.arff, we got the >> following error: >> >> The command we use to run SAMOA Local: >> >> bin/samoa local target/SAMOA-Local-0.3.0-SNAPSHOT.jar >> "PrequentialEvaluation -i -1 -f 41920 -l >> (com.yahoo.labs.samoa.learners.classifiers.trees.VerticalHoeffdingTree -p >> 4) -s (com.yahoo.labs.samoa.moa.streams.ArffFileStream -f kddcup_full.arff)" >> >> The console output of samoa: >> >> bin/samoa >> >> Deploying to LOCAL >> >> Command line string = PrequentialEvaluation -i -1 -f 41920 -l >> (com.yahoo.labs.samoa.learners.classifiers.trees.VerticalHoeffdingTree -p >> 4) -s (com.yahoo.labs.samoa.moa.streams.ArffFileStream -f kddcup_full.arff) >> >> 2015-09-01 22:22:16,160 [main] INFO com.yahoo.labs.samoa.LocalDoTask >> (LocalDoTask.java:79) - Successfully instantiating >> com.yahoo.labs.samoa.tasks.PrequentialEvaluation >> >> 2015-09-01 22:22:17,741 [main] INFO >> com.yahoo.labs.samoa.evaluation.EvaluatorProcessor >> (EvaluatorProcessor.java:86) - 1 seconds for 41920 instances >> >> 2015-09-01 22:22:17,760 [main] INFO >> com.yahoo.labs.samoa.evaluation.EvaluatorProcessor >> (EvaluatorProcessor.java:172) - evaluation instances = 41,920 >> >> classified instances = 41,920 >> >> classifications correct (percent) = 99.988 >> >> Kappa Statistic (percent) = -0.002 >> >> Kappa Temporal Statistic (percent) = 28.571 >> >> Exception in thread "main" java.lang.NullPointerException >> >> at >> com.yahoo.labs.samoa.learners.classifiers.trees.ModelAggregatorProcessor.process(ModelAggregatorProcessor.java:145) >> >> at >> com.yahoo.labs.samoa.topology.impl.SimpleProcessingItem.processEvent(SimpleProcessingItem.java:84) >> >> at >> com.yahoo.labs.samoa.topology.impl.SimpleStream.put(SimpleStream.java:71) >> >> at >> com.yahoo.labs.samoa.topology.impl.SimpleStream.put(SimpleStream.java:60) >> >> at >> com.yahoo.labs.samoa.learners.classifiers.trees.FilterProcessor.process(FilterProcessor.java:95) >> >> at >> com.yahoo.labs.samoa.topology.impl.SimpleProcessingItem.processEvent(SimpleProcessingItem.java:84) >> >> at >> com.yahoo.labs.samoa.topology.impl.SimpleStream.put(SimpleStream.java:71) >> >> at >> com.yahoo.labs.samoa.topology.impl.SimpleStream.put(SimpleStream.java:60) >> >> at >> com.yahoo.labs.samoa.topology.LocalEntranceProcessingItem.injectNextEvent(LocalEntranceProcessingItem.java:46) >> >> at >> com.yahoo.labs.samoa.topology.LocalEntranceProcessingItem.startSendingEvents(LocalEntranceProcessingItem.java:66) >> >> at >> com.yahoo.labs.samoa.topology.impl.SimpleTopology.run(SimpleTopology.java:42) >> >> at >> com.yahoo.labs.samoa.topology.impl.SimpleEngine.submitTopology(SimpleEngine.java:33) >> >> at com.yahoo.labs.samoa.LocalDoTask.main(LocalDoTask.java:87) >> >> >> We were able to track down the problem to the first instance that causes >> it; the instance is on the 76426th line in kddcup_full.arff. The >> instance is as follows: >> >> >> 1,tcp,smtp,SF,2252,331,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,7,0,0,0,0,1,0,1,5,216,1,0,0.2,0.01,0,0,0,0,normal >> >> We haven’t noticed any differences between the problematic instance and >> the other instances. Could you lead us to the root of the problem and could >> you help us on how to overcome this problem? >> >> As a workaround we’ve made the following addition to >> ModelAggregatorProcessor.java >> >> if (leafNode == null) >> >> return false; >> >> after the line >> >> ActiveLearningNode leafNode = (ActiveLearningNode) foundNode.getNode(); >> >> Now, also VeriticalHoeffdingTree Classifier works fine on kddcup_full.arff. >> Is this solution acceptable for the problem, what do you think? >> >> >> Besides, we were wondering how we could fetch model contents such as >> visiting nodes and node content etc. >> >> Thanks for your help, >> >> >> Respectfully, >> >> Ercan Ozturk, Davut Deniz Yavuz, Gozde Boztepe, Sezin Gurkan >> >> >
