[ 
https://issues.apache.org/jira/browse/SPARK-20199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15953084#comment-15953084
 ] 

sudipto pal edited comment on SPARK-20199 at 4/3/17 7:42 AM:
-------------------------------------------------------------

[~srowen]
GBM Tuning parameters unavailable in Spark 2.1.0: 
1.       Column Sampling Rate: present in H2O & XGBoost. It takes a 
pre-specified sample of the features while building a tree.
2.       Regularization on leaf node weights: present in XGBoost. Ref: 
http://xgboost.readthedocs.io/en/latest/parameter.html#parameters-for-linear-booster
 
3.       learning rate annealing: present in H2O. Reduces / increases the 
learning rate (called stepSize in spark), at every subsequent trees.
 
Other features missing (compared to H2O and/or XGBoost):
4.       Multiclass Classification can’t be done
5.       Offset: present in H2O. Same as a GLM offset.
6.       Choice of distributions do not include Gamma, Tweedie, Poisson
7.       Generates classes, not probabilities (they said later version will 
take care of this)
 
 
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.tree.model.GradientBoostedTreesModel
https://spark.apache.org/docs/2.0.1/api/java/org/apache/spark/ml/classification/GBTClassifier.html
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.regression.GBTRegressor
 
 


was (Author: [email protected]):
[~srowen]
GBM Tuning parameters unavailable in Spark 2.1.0: 
1.       Column Sampling Rate: present in H2O & XGBoost, important feature
2.       Regularization on leaf node weights: present in XGBoost
3.       learning rate annealing: present in H2O
 
Other features missing (compared to H2O and/or XGBoost):
4.       Multiclass Classification can’t be done
5.       Offset: present in H2O
6.       Choice of distributions do not include Gamma, Tweedie, Poisson
7.       Generates classes, not probabilities (they said later version will 
take care of this)
 
 
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.tree.model.GradientBoostedTreesModel
https://spark.apache.org/docs/2.0.1/api/java/org/apache/spark/ml/classification/GBTClassifier.html
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.regression.GBTRegressor
 
 

> GradientBoostedTreesModel doesn't have  Column Sampling Rate Paramenter
> -----------------------------------------------------------------------
>
>                 Key: SPARK-20199
>                 URL: https://issues.apache.org/jira/browse/SPARK-20199
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, MLlib
>    Affects Versions: 2.1.0
>            Reporter: pralabhkumar
>            Priority: Minor
>
> Spark GradientBoostedTreesModel doesn't have Column  sampling rate parameter 
> . This parameter is available in H2O and XGBoost. 
> Sample from H2O.ai 
> gbmParams._col_sample_rate
> Please provide the parameter . 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to