[
https://issues.apache.org/jira/browse/SPARK-20199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15953084#comment-15953084
]
sudipto pal commented on SPARK-20199:
-------------------------------------
[~srowen]
GBM Tuning parameters unavailable in Spark 2.1.0:
1. Column Sampling Rate: present in H2O & XGBoost, important feature
2. Regularization on leaf node weights: present in XGBoost
3. learning rate annealing: present in H2O
Other features missing (compared to H2O and/or XGBoost):
4. Multiclass Classification can’t be done
5. Offset: present in H2O
6. Choice of distributions do not include Gamma, Tweedie, Poisson
7. Generates classes, not probabilities (they said later version will
take care of this)
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.tree.model.GradientBoostedTreesModel
https://spark.apache.org/docs/2.0.1/api/java/org/apache/spark/ml/classification/GBTClassifier.html
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.regression.GBTRegressor
> GradientBoostedTreesModel doesn't have Column Sampling Rate Paramenter
> -----------------------------------------------------------------------
>
> Key: SPARK-20199
> URL: https://issues.apache.org/jira/browse/SPARK-20199
> Project: Spark
> Issue Type: Improvement
> Components: ML, MLlib
> Affects Versions: 2.1.0
> Reporter: pralabhkumar
> Priority: Minor
>
> Spark GradientBoostedTreesModel doesn't have Column sampling rate parameter
> . This parameter is available in H2O and XGBoost.
> Sample from H2O.ai
> gbmParams._col_sample_rate
> Please provide the parameter .
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]