[jira] [Updated] (SPARK-7781) GradientBoostedTrees.trainRegressor is missing maxBins parameter in pyspark

Joseph K. Bradley (JIRA) Thu, 21 May 2015 10:43:00 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-7781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Joseph K. Bradley updated SPARK-7781:
-------------------------------------
    Target Version/s: 1.5.0

> GradientBoostedTrees.trainRegressor is missing maxBins parameter in pyspark
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-7781
>                 URL: https://issues.apache.org/jira/browse/SPARK-7781
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 1.3.1
>            Reporter: Don Drake
>
> I'm running Spark v1.3.1 and when I run the following against my dataset:
> {code}
> model = GradientBoostedTrees.trainRegressor(trainingData, 
> categoricalFeaturesInfo=catFeatures, maxDepth=6, numIterations=3)
> The job will fail with the following message:
> Traceback (most recent call last):
>   File "/Users/drake/fd/spark/mltest.py", line 73, in <module>
>     model = GradientBoostedTrees.trainRegressor(trainingData, 
> categoricalFeaturesInfo=catFeatures, maxDepth=6, numIterations=3)
>   File 
> "/Users/drake/spark/spark-1.3.1-bin-hadoop2.6/python/pyspark/mllib/tree.py", 
> line 553, in trainRegressor
>     loss, numIterations, learningRate, maxDepth)
>   File 
> "/Users/drake/spark/spark-1.3.1-bin-hadoop2.6/python/pyspark/mllib/tree.py", 
> line 438, in _train
>     loss, numIterations, learningRate, maxDepth)
>   File 
> "/Users/drake/spark/spark-1.3.1-bin-hadoop2.6/python/pyspark/mllib/common.py",
>  line 120, in callMLlibFunc
>     return callJavaFunc(sc, api, *args)
>   File 
> "/Users/drake/spark/spark-1.3.1-bin-hadoop2.6/python/pyspark/mllib/common.py",
>  line 113, in callJavaFunc
>     return _java2py(sc, func(*args))
>   File 
> "/Users/drake/spark/spark-1.3.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
>  line 538, in __call__
>   File 
> "/Users/drake/spark/spark-1.3.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
>  line 300, in get_return_value
> 15/05/20 16:40:12 INFO BlockManager: Removing block rdd_32_95
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> o69.trainGradientBoostedTreesModel.
> : java.lang.IllegalArgumentException: requirement failed: DecisionTree 
> requires maxBins (= 32) >= max categories in categorical features (= 1895)
>       at scala.Predef$.require(Predef.scala:233)
>       at 
> org.apache.spark.mllib.tree.impl.DecisionTreeMetadata$.buildMetadata(DecisionTreeMetadata.scala:128)
>       at org.apache.spark.mllib.tree.RandomForest.run(RandomForest.scala:138)
>       at org.apache.spark.mllib.tree.DecisionTree.run(DecisionTree.scala:60)
>       at 
> org.apache.spark.mllib.tree.GradientBoostedTrees$.org$apache$spark$mllib$tree$GradientBoostedTrees$$boost(GradientBoostedTrees.scala:150)
>       at 
> org.apache.spark.mllib.tree.GradientBoostedTrees.run(GradientBoostedTrees.scala:63)
>       at 
> org.apache.spark.mllib.tree.GradientBoostedTrees$.train(GradientBoostedTrees.scala:96)
>       at 
> org.apache.spark.mllib.api.python.PythonMLLibAPI.trainGradientBoostedTreesModel(PythonMLLibAPI.scala:595)
> {code}
> So, it's complaining about the maxBins, if I provide maxBins=1900 and re-run 
> it:
> {code}
> model = GradientBoostedTrees.trainRegressor(trainingData, 
> categoricalFeaturesInfo=catFeatures, maxDepth=6, numIterations=3, 
> maxBins=1900)
> Traceback (most recent call last):
>   File "/Users/drake/fd/spark/mltest.py", line 73, in <module>
>     model = GradientBoostedTrees.trainRegressor(trainingData, 
> categoricalFeaturesInfo=catF
> eatures, maxDepth=6, numIterations=3, maxBins=1900)
> TypeError: trainRegressor() got an unexpected keyword argument 'maxBins'
> {code}
> It now says it knows nothing of maxBins.
> If I run the same command against DecisionTree or RandomForest (with 
> maxBins=1900) it works just fine.
> Seems like a bug in GradientBoostedTrees. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7781) GradientBoostedTrees.trainRegressor is missing maxBins parameter in pyspark

Reply via email to