[
https://issues.apache.org/jira/browse/SPARK-20199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954796#comment-15954796
]
pralabhkumar commented on SPARK-20199:
--------------------------------------
Hi
GBM is internally using Random Forest
GradientBoostedTrees have method boost which calls DescisionTreeRegressor Train
method to build the trees.
private[ml] def train(data: RDD[LabeledPoint],
oldStrategy: OldStrategy): DecisionTreeRegressionModel = {
val instr = Instrumentation.create(this, data)
instr.logParams(params: _*)
val trees = RandomForest.run(data, oldStrategy, numTrees = 1,
featureSubsetStrategy = "all",
seed = $(seed), instr = Some(instr), parentUID = Some(uid))
val m = trees.head.asInstanceOf[DecisionTreeRegressionModel]
instr.logSuccess(m)
m
}
Here the featureSubsetStrategy is hardcoded to "all" , is there any specific
reason to do that . Shouldn't the property expose to user to chose the
featureSubsetStrategy from "auto", "all" ,"sqrt" , "log2" , "onethird" .
> GradientBoostedTreesModel doesn't have Column Sampling Rate Paramenter
> -----------------------------------------------------------------------
>
> Key: SPARK-20199
> URL: https://issues.apache.org/jira/browse/SPARK-20199
> Project: Spark
> Issue Type: Improvement
> Components: ML, MLlib
> Affects Versions: 2.1.0
> Reporter: pralabhkumar
> Priority: Minor
>
> Spark GradientBoostedTreesModel doesn't have Column sampling rate parameter
> . This parameter is available in H2O and XGBoost.
> Sample from H2O.ai
> gbmParams._col_sample_rate
> Please provide the parameter .
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]