Gianmarco De Francisci Morales created SAMOA-44:
---------------------------------------------------

             Summary: NPE when running VHT on KDD cup data
                 Key: SAMOA-44
                 URL: https://issues.apache.org/jira/browse/SAMOA-44
             Project: SAMOA
          Issue Type: Bug
          Components: SAMOA-API
            Reporter: Gianmarco De Francisci Morales


>From the mailing list:

We were able to run HoeffdingTree Algorithm on the KDD Cup 99 (both on 
kddcup_full.arff, kddcup_10_percent.arff) data set. VerticalHoeffdingTree 
classifier also works fine on kddcup_10_percent.arff. However, when we try to 
run the VerticalHoeffdingTree classifier on kddcup_full.arff, we got the 
following error: 

The command we use to run SAMOA Local:

bin/samoa local target/SAMOA-Local-0.3.0-SNAPSHOT.jar "PrequentialEvaluation -i 
-1 -f 41920 -l 
(com.yahoo.labs.samoa.learners.classifiers.trees.VerticalHoeffdingTree -p 4) -s 
(com.yahoo.labs.samoa.moa.streams.ArffFileStream -f kddcup_full.arff)"

The console output of samoa:

bin/samoa
Deploying to LOCAL
Command line string =  PrequentialEvaluation -i -1 -f 41920 -l 
(com.yahoo.labs.samoa.learners.classifiers.trees.VerticalHoeffdingTree -p 4) -s 
(com.yahoo.labs.samoa.moa.streams.ArffFileStream -f kddcup_full.arff)
2015-09-01 22:22:16,160 [main] INFO  com.yahoo.labs.samoa.LocalDoTask 
(LocalDoTask.java:79) - Successfully instantiating 
com.yahoo.labs.samoa.tasks.PrequentialEvaluation
2015-09-01 22:22:17,741 [main] INFO  
com.yahoo.labs.samoa.evaluation.EvaluatorProcessor (EvaluatorProcessor.java:86) 
- 1 seconds for 41920 instances
2015-09-01 22:22:17,760 [main] INFO  
com.yahoo.labs.samoa.evaluation.EvaluatorProcessor 
(EvaluatorProcessor.java:172) - evaluation instances = 41,920
classified instances = 41,920
classifications correct (percent) = 99.988
Kappa Statistic (percent) = -0.002
Kappa Temporal Statistic (percent) = 28.571
Exception in thread "main" java.lang.NullPointerException
        at 
com.yahoo.labs.samoa.learners.classifiers.trees.ModelAggregatorProcessor.process(ModelAggregatorProcessor.java:145)
        at 
com.yahoo.labs.samoa.topology.impl.SimpleProcessingItem.processEvent(SimpleProcessingItem.java:84)
        at 
com.yahoo.labs.samoa.topology.impl.SimpleStream.put(SimpleStream.java:71)
        at 
com.yahoo.labs.samoa.topology.impl.SimpleStream.put(SimpleStream.java:60)
        at 
com.yahoo.labs.samoa.learners.classifiers.trees.FilterProcessor.process(FilterProcessor.java:95)
        at 
com.yahoo.labs.samoa.topology.impl.SimpleProcessingItem.processEvent(SimpleProcessingItem.java:84)
        at 
com.yahoo.labs.samoa.topology.impl.SimpleStream.put(SimpleStream.java:71)
        at 
com.yahoo.labs.samoa.topology.impl.SimpleStream.put(SimpleStream.java:60)
        at 
com.yahoo.labs.samoa.topology.LocalEntranceProcessingItem.injectNextEvent(LocalEntranceProcessingItem.java:46)
        at 
com.yahoo.labs.samoa.topology.LocalEntranceProcessingItem.startSendingEvents(LocalEntranceProcessingItem.java:66)
        at 
com.yahoo.labs.samoa.topology.impl.SimpleTopology.run(SimpleTopology.java:42)
        at 
com.yahoo.labs.samoa.topology.impl.SimpleEngine.submitTopology(SimpleEngine.java:33)
        at com.yahoo.labs.samoa.LocalDoTask.main(LocalDoTask.java:87)


We were able to track down the problem to the first instance that causes it; 
the instance is on the 76426th line in kddcup_full.arff. The instance is as 
follows:

1,tcp,smtp,SF,2252,331,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,7,0,0,0,0,1,0,1,5,216,1,0,0.2,0.01,0,0,0,0,normal

We haven’t noticed any differences between the problematic instance and the 
other instances. Could you lead us to the root of the problem and could you 
help us on how to overcome this problem?

As a workaround we’ve made the following addition to 
ModelAggregatorProcessor.java
if (leafNode == null)
         return false;

after the line 

ActiveLearningNode leafNode = (ActiveLearningNode) foundNode.getNode();

Now, also VeriticalHoeffdingTree Classifier works fine on kddcup_full.arff. Is 
this solution acceptable for the problem, what do you think?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to