[ https://issues.apache.org/jira/browse/SPARK-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294046#comment-14294046 ]
Nicolas Garneau commented on SPARK-5119: ---------------------------------------- Hey guys, I am wondering what you think about letting the user control if its feature vectors are 0-based or 1-based. I used to have 0-based vectors for my datasets (worked a lot with scikit-learn) and I saw in the loadLibSVMFile function that you are "converting" any vectors to a 0-based... Thought it would be cool to add a optional parameters or something... Thanks guys, I'd be glad to give you some help :) > java.lang.ArrayIndexOutOfBoundsException on trying to train decision tree > model > ------------------------------------------------------------------------------- > > Key: SPARK-5119 > URL: https://issues.apache.org/jira/browse/SPARK-5119 > Project: Spark > Issue Type: Bug > Components: ML, MLlib > Affects Versions: 1.1.0, 1.2.0 > Environment: Linux ubuntu 14.04 > Reporter: Vivek Kulkarni > Assignee: Kai Sasaki > Fix For: 1.3.0 > > > First I tried to see if there was a bug raised before with similar trace. I > found https://www.mail-archive.com/user@spark.apache.org/msg13708.html but > the suggestion to upgarde to latest code bae ( I cloned from master branch) > does not fix this issue. > Issue: try to train a decision tree classifier on some data.After training > and when it begins colllect, it crashes: > 15/01/06 22:28:15 INFO BlockManagerMaster: Updated info of block rdd_52_1 > 15/01/06 22:28:15 ERROR Executor: Exception in task 1.0 in stage 31.0 (TID > 1895) > java.lang.ArrayIndexOutOfBoundsException: -1 > at > org.apache.spark.mllib.tree.impurity.GiniAggregator.update(Gini.scala:93) > at > org.apache.spark.mllib.tree.impl.DTStatsAggregator.update(DTStatsAggregator.scala:100) > at > org.apache.spark.mllib.tree.DecisionTree$.orderedBinSeqOp(DecisionTree.scala:419) > at > org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$nodeBinSeqOp$1(DecisionTree.scala:511) > at > org.apache.spark.mllib.tree.DecisionTree$$anonfun$org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1$1.apply(DecisionTree.scala:536 > ) > at > org.apache.spark.mllib.tree.DecisionTree$$anonfun$org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1$1.apply(DecisionTree.scala:533 > ) > at scala.collection.immutable.Map$Map1.foreach(Map.scala:109) > at > org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1(DecisionTree.scala:533) > at > org.apache.spark.mllib.tree.DecisionTree$$anonfun$6$$anonfun$apply$8.apply(DecisionTree.scala:628) > at > org.apache.spark.mllib.tree.DecisionTree$$anonfun$6$$anonfun$apply$8.apply(DecisionTree.scala:628) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > Minimal code: > data = MLUtils.loadLibSVMFile(sc, > '/scratch1/vivek/datasets/private/a1a').cache() > model = DecisionTree.trainClassifier(data, numClasses=2, > categoricalFeaturesInfo={}, maxDepth=5, maxBins=100) > Just download the data from: > http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/a1a -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org