Sure, the ticket is SAMOA-44 <https://issues.apache.org/jira/browse/SAMOA-44>.
Arinto had started the work on model dumping, I don't know what's the status there. But it should be straightforward to implement a recursive method. If you could post the dataset somewhere where it is possible to download it, it would be great. If you want to take a stab at debugging what's going on and provide a patch, it would be even better. Cheers, -- Gianmarco On 10 September 2015 at 08:49, Ercan Öztürk <[email protected]> wrote: > Hi, > > Thank you very much for your quick response. > > We were using an older version of SAMOA. I've updated the code now (The > last commit is currently "SAMOA-29: Excluding the samoa-storm.properties at > compile time and including at test") and after building the code with "mvn > package" the new command we use to run SAMOA is > > local target/SAMOA-Local-0.4.0-incubating-SNAPSHOT.jar > "PrequentialEvaluation -i -1 -f 41920 -l > (org.apache.samoa.learners.classifiers.trees.VerticalHoeffdingTree -p 4) -s > (org.apache.samoa.moa.streams.ArffFileStream -f kddcup_full.arff)" > > The console output when the command is run: > > bin/samoa > Deploying to LOCAL > Command line string = PrequentialEvaluation -i -1 -f 41920 -l > (org.apache.samoa.learners.classifiers.trees.VerticalHoeffdingTree -p 4) -s > (org.apache.samoa.moa.streams.ArffFileStream -f kddcup_full.arff) > 2015-09-09 15:56:30,036 [main] INFO org.apache.samoa.LocalDoTask > (LocalDoTask.java:80) - Successfully instantiating > org.apache.samoa.tasks.PrequentialEvaluation > 2015-09-09 15:56:31,221 [main] INFO > org.apache.samoa.evaluation.EvaluatorProcessor > (EvaluatorProcessor.java:83) - 1 seconds for 41920 instances > 2015-09-09 15:56:31,227 [main] INFO > org.apache.samoa.evaluation.EvaluatorProcessor > (EvaluatorProcessor.java:169) - evaluation instances = 41,920 > classified instances = 41,920 > classifications correct (percent) = 99.988 > Kappa Statistic (percent) = -0.002 > Kappa Temporal Statistic (percent) = 28.571 > Exception in thread "main" java.lang.NullPointerException > at > org.apache.samoa.learners.classifiers.trees.ModelAggregatorProcessor.process(ModelAggregatorProcessor.java:142) > at > org.apache.samoa.topology.impl.SimpleProcessingItem.processEvent(SimpleProcessingItem.java:84) > at org.apache.samoa.topology.impl.SimpleStream.put(SimpleStream.java:72) > at org.apache.samoa.topology.impl.SimpleStream.put(SimpleStream.java:61) > at > org.apache.samoa.learners.classifiers.trees.FilterProcessor.process(FilterProcessor.java:93) > at > org.apache.samoa.topology.impl.SimpleProcessingItem.processEvent(SimpleProcessingItem.java:84) > at org.apache.samoa.topology.impl.SimpleStream.put(SimpleStream.java:72) > at org.apache.samoa.topology.impl.SimpleStream.put(SimpleStream.java:61) > at > org.apache.samoa.topology.LocalEntranceProcessingItem.injectNextEvent(LocalEntranceProcessingItem.java:45) > at > org.apache.samoa.topology.LocalEntranceProcessingItem.startSendingEvents(LocalEntranceProcessingItem.java:63) > at > org.apache.samoa.topology.impl.SimpleTopology.run(SimpleTopology.java:44) > at > org.apache.samoa.topology.impl.SimpleEngine.submitTopology(SimpleEngine.java:33) > at org.apache.samoa.LocalDoTask.main(LocalDoTask.java:88) > > > We would be very appreciated if you could send us the link for the ticket > so we can follow the updates on the issue. > > Yes, we would like to dump the model so that we can see the rules of the > model and have a better understanding of it. > > The method body of describeSubtree() in Node.java is currently empty. Is > there any work done on it that we can use as a starting point? > > If you need the data set to investigate the issue, I can send it via any > suitable channel, please let me know. > > Respectfully, > Ercan Ozturk > > 2015-09-09 15:11 GMT+03:00 Gianmarco De Francisci Morales <[email protected] > >: > >> Hi, >> >> Thanks for reporting the bug. >> I'm not sure what is causing the issue. >> Are you using the master version of SAMOA? >> My line 145 of ModelAggregator is: >> this.sendToAttributeStream(abce[i]); >> >> From what you say it seems that the problem is a bit above, and leafNode >> is null. >> However, by construction there should always be a leaf node. >> >> As a workaround your solution is fine, but I guess there is some other >> underlying problem with the code, which might cause some loss in accuracy. >> We should investigate this issue further, I'll open a ticket. >> >> Regarding fetching the content of the model, we had some prototype model >> dumper code (Arinto had started it), but I guess it's not working anymore. >> See the describeSubtree() method in Node.java. >> So unfortunately you need to do it yourself. However, the good thing is >> that the tree model is in a single place in ModelAggregator, so it should >> be relatively easy to walk the tree, starting from the root node. >> Do you want to dump the model to a text representation for human >> inspection? >> >> Cheers, >> >> >> -- >> Gianmarco >> >> On 7 September 2015 at 18:23, Gianmarco De Francisci Morales < >> [email protected]> wrote: >> >>> Forwarding to the @dev list. >>> -- >>> Gianmarco >>> >>> ---------- Forwarded message ---------- >>> From: Ercan Öztürk <[email protected]> >>> Date: 7 September 2015 at 16:57 >>> Subject: HoeffdingTree and VerticalHoeffdingTree Classifiers run on KDD >>> Cup 99 Data Set >>> To: [email protected] >>> >>> >>> Hi Mr. Morales and Mr. Bifet, >>> >>> We are a couple of undergrad students from TOBB University. As a data >>> mining class project, we decided to run HoeffdingTree classifier-in moa and >>> VerticalHoeffdingTree classifier-in samoa on KDD Cup 99 data set (couldn't >>> attach the data set to this mail due to the size limitations of the Apache >>> mail server) and present the results in our project report. >>> >>> We were able to run HoeffdingTree Algorithm on the KDD Cup 99 (both on >>> kddcup_full.arff, kddcup_10_percent.arff) data set. >>> VerticalHoeffdingTree classifier also works fine on >>> kddcup_10_percent.arff. However, when we try to run the >>> VerticalHoeffdingTree classifier on kddcup_full.arff, we got the >>> following error: >>> >>> The command we use to run SAMOA Local: >>> >>> bin/samoa local target/SAMOA-Local-0.3.0-SNAPSHOT.jar >>> "PrequentialEvaluation -i -1 -f 41920 -l >>> (com.yahoo.labs.samoa.learners.classifiers.trees.VerticalHoeffdingTree -p >>> 4) -s (com.yahoo.labs.samoa.moa.streams.ArffFileStream -f kddcup_full.arff)" >>> >>> The console output of samoa: >>> >>> bin/samoa >>> >>> Deploying to LOCAL >>> >>> Command line string = PrequentialEvaluation -i -1 -f 41920 -l >>> (com.yahoo.labs.samoa.learners.classifiers.trees.VerticalHoeffdingTree -p >>> 4) -s (com.yahoo.labs.samoa.moa.streams.ArffFileStream -f kddcup_full.arff) >>> >>> 2015-09-01 22:22:16,160 [main] INFO com.yahoo.labs.samoa.LocalDoTask >>> (LocalDoTask.java:79) - Successfully instantiating >>> com.yahoo.labs.samoa.tasks.PrequentialEvaluation >>> >>> 2015-09-01 22:22:17,741 [main] INFO >>> com.yahoo.labs.samoa.evaluation.EvaluatorProcessor >>> (EvaluatorProcessor.java:86) - 1 seconds for 41920 instances >>> >>> 2015-09-01 22:22:17,760 [main] INFO >>> com.yahoo.labs.samoa.evaluation.EvaluatorProcessor >>> (EvaluatorProcessor.java:172) - evaluation instances = 41,920 >>> >>> classified instances = 41,920 >>> >>> classifications correct (percent) = 99.988 >>> >>> Kappa Statistic (percent) = -0.002 >>> >>> Kappa Temporal Statistic (percent) = 28.571 >>> >>> Exception in thread "main" java.lang.NullPointerException >>> >>> at >>> com.yahoo.labs.samoa.learners.classifiers.trees.ModelAggregatorProcessor.process(ModelAggregatorProcessor.java:145) >>> >>> at >>> com.yahoo.labs.samoa.topology.impl.SimpleProcessingItem.processEvent(SimpleProcessingItem.java:84) >>> >>> at >>> com.yahoo.labs.samoa.topology.impl.SimpleStream.put(SimpleStream.java:71) >>> >>> at >>> com.yahoo.labs.samoa.topology.impl.SimpleStream.put(SimpleStream.java:60) >>> >>> at >>> com.yahoo.labs.samoa.learners.classifiers.trees.FilterProcessor.process(FilterProcessor.java:95) >>> >>> at >>> com.yahoo.labs.samoa.topology.impl.SimpleProcessingItem.processEvent(SimpleProcessingItem.java:84) >>> >>> at >>> com.yahoo.labs.samoa.topology.impl.SimpleStream.put(SimpleStream.java:71) >>> >>> at >>> com.yahoo.labs.samoa.topology.impl.SimpleStream.put(SimpleStream.java:60) >>> >>> at >>> com.yahoo.labs.samoa.topology.LocalEntranceProcessingItem.injectNextEvent(LocalEntranceProcessingItem.java:46) >>> >>> at >>> com.yahoo.labs.samoa.topology.LocalEntranceProcessingItem.startSendingEvents(LocalEntranceProcessingItem.java:66) >>> >>> at >>> com.yahoo.labs.samoa.topology.impl.SimpleTopology.run(SimpleTopology.java:42) >>> >>> at >>> com.yahoo.labs.samoa.topology.impl.SimpleEngine.submitTopology(SimpleEngine.java:33) >>> >>> at com.yahoo.labs.samoa.LocalDoTask.main(LocalDoTask.java:87) >>> >>> >>> We were able to track down the problem to the first instance that causes >>> it; the instance is on the 76426th line in kddcup_full.arff. The >>> instance is as follows: >>> >>> >>> 1,tcp,smtp,SF,2252,331,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,7,0,0,0,0,1,0,1,5,216,1,0,0.2,0.01,0,0,0,0,normal >>> >>> We haven’t noticed any differences between the problematic instance and >>> the other instances. Could you lead us to the root of the problem and could >>> you help us on how to overcome this problem? >>> >>> As a workaround we’ve made the following addition to >>> ModelAggregatorProcessor.java >>> >>> if (leafNode == null) >>> >>> return false; >>> >>> after the line >>> >>> ActiveLearningNode leafNode = (ActiveLearningNode) foundNode.getNode(); >>> >>> Now, also VeriticalHoeffdingTree Classifier works fine on kddcup_full.arff. >>> Is this solution acceptable for the problem, what do you think? >>> >>> >>> Besides, we were wondering how we could fetch model contents such as >>> visiting nodes and node content etc. >>> >>> Thanks for your help, >>> >>> >>> Respectfully, >>> >>> Ercan Ozturk, Davut Deniz Yavuz, Gozde Boztepe, Sezin Gurkan >>> >>> >> >
