[jira] [Commented] (SPARK-5119) java.lang.ArrayIndexOutOfBoundsException on trying to train decision tree model

Nicolas Garneau (JIRA) Tue, 27 Jan 2015 11:40:05 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294046#comment-14294046
 ]


Nicolas Garneau commented on SPARK-5119:
----------------------------------------

Hey guys, I am wondering what you think about letting the user control if its 
feature vectors are 0-based or 1-based. I used to have 0-based vectors for my 
datasets (worked a lot with scikit-learn) and I saw in the loadLibSVMFile 
function that you are "converting" any vectors to a 0-based...
Thought it would be cool to add a optional parameters or something...
Thanks guys, I'd be glad to give you some help :)

> java.lang.ArrayIndexOutOfBoundsException on trying to train decision tree 
> model
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-5119
>                 URL: https://issues.apache.org/jira/browse/SPARK-5119
>             Project: Spark
>          Issue Type: Bug
>          Components: ML, MLlib
>    Affects Versions: 1.1.0, 1.2.0
>         Environment: Linux ubuntu 14.04
>            Reporter: Vivek Kulkarni
>            Assignee: Kai Sasaki
>             Fix For: 1.3.0
>
>
> First I tried to see if there was a bug raised before with similar trace. I 
> found https://www.mail-archive.com/user@spark.apache.org/msg13708.html but 
> the suggestion to upgarde to latest code bae ( I cloned from master branch) 
> does not fix this issue.
> Issue: try to train a decision tree classifier on some data.After training 
> and when it begins colllect, it crashes:
> 15/01/06 22:28:15 INFO BlockManagerMaster: Updated info of block rdd_52_1
> 15/01/06 22:28:15 ERROR Executor: Exception in task 1.0 in stage 31.0 (TID 
> 1895)
> java.lang.ArrayIndexOutOfBoundsException: -1
>         at 
> org.apache.spark.mllib.tree.impurity.GiniAggregator.update(Gini.scala:93)
>         at 
> org.apache.spark.mllib.tree.impl.DTStatsAggregator.update(DTStatsAggregator.scala:100)
>         at 
> org.apache.spark.mllib.tree.DecisionTree$.orderedBinSeqOp(DecisionTree.scala:419)
>         at 
> org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$nodeBinSeqOp$1(DecisionTree.scala:511)
>         at 
> org.apache.spark.mllib.tree.DecisionTree$$anonfun$org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1$1.apply(DecisionTree.scala:536
> )
>         at 
> org.apache.spark.mllib.tree.DecisionTree$$anonfun$org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1$1.apply(DecisionTree.scala:533
> )
>         at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
>         at 
> org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$binSeqOp$1(DecisionTree.scala:533)
>         at 
> org.apache.spark.mllib.tree.DecisionTree$$anonfun$6$$anonfun$apply$8.apply(DecisionTree.scala:628)
>         at 
> org.apache.spark.mllib.tree.DecisionTree$$anonfun$6$$anonfun$apply$8.apply(DecisionTree.scala:628)
>         at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> Minimal code:
>  data = MLUtils.loadLibSVMFile(sc, 
> '/scratch1/vivek/datasets/private/a1a').cache()
> model = DecisionTree.trainClassifier(data, numClasses=2, 
> categoricalFeaturesInfo={}, maxDepth=5, maxBins=100)
> Just download the data from: 
> http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/a1a



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5119) java.lang.ArrayIndexOutOfBoundsException on trying to train decision tree model

Reply via email to