[ https://issues.apache.org/jira/browse/SPARK-7781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joseph K. Bradley updated SPARK-7781: ------------------------------------- Target Version/s: 1.5.0 > GradientBoostedTrees.trainRegressor is missing maxBins parameter in pyspark > --------------------------------------------------------------------------- > > Key: SPARK-7781 > URL: https://issues.apache.org/jira/browse/SPARK-7781 > Project: Spark > Issue Type: Bug > Components: MLlib > Affects Versions: 1.3.1 > Reporter: Don Drake > > I'm running Spark v1.3.1 and when I run the following against my dataset: > {code} > model = GradientBoostedTrees.trainRegressor(trainingData, > categoricalFeaturesInfo=catFeatures, maxDepth=6, numIterations=3) > The job will fail with the following message: > Traceback (most recent call last): > File "/Users/drake/fd/spark/mltest.py", line 73, in <module> > model = GradientBoostedTrees.trainRegressor(trainingData, > categoricalFeaturesInfo=catFeatures, maxDepth=6, numIterations=3) > File > "/Users/drake/spark/spark-1.3.1-bin-hadoop2.6/python/pyspark/mllib/tree.py", > line 553, in trainRegressor > loss, numIterations, learningRate, maxDepth) > File > "/Users/drake/spark/spark-1.3.1-bin-hadoop2.6/python/pyspark/mllib/tree.py", > line 438, in _train > loss, numIterations, learningRate, maxDepth) > File > "/Users/drake/spark/spark-1.3.1-bin-hadoop2.6/python/pyspark/mllib/common.py", > line 120, in callMLlibFunc > return callJavaFunc(sc, api, *args) > File > "/Users/drake/spark/spark-1.3.1-bin-hadoop2.6/python/pyspark/mllib/common.py", > line 113, in callJavaFunc > return _java2py(sc, func(*args)) > File > "/Users/drake/spark/spark-1.3.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", > line 538, in __call__ > File > "/Users/drake/spark/spark-1.3.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", > line 300, in get_return_value > 15/05/20 16:40:12 INFO BlockManager: Removing block rdd_32_95 > py4j.protocol.Py4JJavaError: An error occurred while calling > o69.trainGradientBoostedTreesModel. > : java.lang.IllegalArgumentException: requirement failed: DecisionTree > requires maxBins (= 32) >= max categories in categorical features (= 1895) > at scala.Predef$.require(Predef.scala:233) > at > org.apache.spark.mllib.tree.impl.DecisionTreeMetadata$.buildMetadata(DecisionTreeMetadata.scala:128) > at org.apache.spark.mllib.tree.RandomForest.run(RandomForest.scala:138) > at org.apache.spark.mllib.tree.DecisionTree.run(DecisionTree.scala:60) > at > org.apache.spark.mllib.tree.GradientBoostedTrees$.org$apache$spark$mllib$tree$GradientBoostedTrees$$boost(GradientBoostedTrees.scala:150) > at > org.apache.spark.mllib.tree.GradientBoostedTrees.run(GradientBoostedTrees.scala:63) > at > org.apache.spark.mllib.tree.GradientBoostedTrees$.train(GradientBoostedTrees.scala:96) > at > org.apache.spark.mllib.api.python.PythonMLLibAPI.trainGradientBoostedTreesModel(PythonMLLibAPI.scala:595) > {code} > So, it's complaining about the maxBins, if I provide maxBins=1900 and re-run > it: > {code} > model = GradientBoostedTrees.trainRegressor(trainingData, > categoricalFeaturesInfo=catFeatures, maxDepth=6, numIterations=3, > maxBins=1900) > Traceback (most recent call last): > File "/Users/drake/fd/spark/mltest.py", line 73, in <module> > model = GradientBoostedTrees.trainRegressor(trainingData, > categoricalFeaturesInfo=catF > eatures, maxDepth=6, numIterations=3, maxBins=1900) > TypeError: trainRegressor() got an unexpected keyword argument 'maxBins' > {code} > It now says it knows nothing of maxBins. > If I run the same command against DecisionTree or RandomForest (with > maxBins=1900) it works just fine. > Seems like a bug in GradientBoostedTrees. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org