Vivek Kulkarni created SPARK-5119:
-------------------------------------

             Summary: java.lang.ArrayIndexOutOfBoundsException on trying to 
train decision tree model
                 Key: SPARK-5119
                 URL: https://issues.apache.org/jira/browse/SPARK-5119
             Project: Spark
          Issue Type: Bug
          Components: ML, MLlib
    Affects Versions: 1.2.0, 1.1.0
         Environment: Linux ubuntu 14.04
            Reporter: Vivek Kulkarni


First I tried to see if there was a bug raised before with similar trace. I 
found https://www.mail-archive.com/[email protected]/msg13708.html but the 
suggestion to upgarde to latest code bae ( I cloned from master branch) does 
not fix this issue.

Issue: try to train a decision tree classifier on some data.After training and 
when it begins colllect, it crashes:

15/01/06 22:28:15 INFO BlockManagerMaster: Updated info of block rdd_52_1
15/01/06 22:28:15 ERROR Executor: Exception in task 1.0 in stage 31.0 (TID 1895)
java.lang.ArrayIndexOutOfBoundsException: -1
        at 
org.apache.spark.mllib.tree.impurity.GiniAggregator.update(Gini.scala:93)
        at 
org.apache.spark.mllib.tree.impl.DTStatsAggregator.update(DTStatsAggregator.scala:100)
        at 
org.apache.spark.mllib.tree.DecisionTree$.orderedBinSeqOp(DecisionTree.scala:419)
        at 
org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$nodeBinSeqOp$1(DecisionTree.scala:511)
        at 
org.apache.spark.mllib.tree.DecisionTree$$anonfun$org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1$1.apply(DecisionTree.scala:536
)
        at 
org.apache.spark.mllib.tree.DecisionTree$$anonfun$org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1$1.apply(DecisionTree.scala:533
)
        at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
        at 
org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1(DecisionTree.scala:533)
        at 
org.apache.spark.mllib.tree.DecisionTree$$anonfun$6$$anonfun$apply$8.apply(DecisionTree.scala:628)
        at 
org.apache.spark.mllib.tree.DecisionTree$$anonfun$6$$anonfun$apply$8.apply(DecisionTree.scala:628)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)

Minimal code:
 data = MLUtils.loadLibSVMFile(sc, 
'/scratch1/vivek/datasets/private/a1a').cache()

model = DecisionTree.trainClassifier(data, numClasses=2, 
categoricalFeaturesInfo={}, maxDepth=5, maxBins=100)

Just download the data from: 
http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/a1a




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to