Re: Reduce the memory usage if we do same first in GradientBoostedTrees if subsamplingRate< 1.0

Joseph Bradley Tue, 15 Nov 2016 11:54:56 -0800

Thanks for the suggestion.  That would be faster, but less accurate in most
cases.  It's generally better to use a new random sample on each iteration,
based on literature and results I've seen.
Joseph


On Fri, Nov 11, 2016 at 5:13 AM, WangJianfei <
[email protected]> wrote:

> when we train the mode, we will use the data with a subSampleRate, so if
> the
> subSampleRate < 1.0 , we can do a sample first to reduce the memory usage.
> se the code below in GradientBoostedTrees.boost()
>
>  while (m < numIterations && !doneLearning) {
>       // Update data with pseudo-residuals 剩余误差
>       val data = predError.zip(input).map { case ((pred, _), point) =>
>         LabeledPoint(-loss.gradient(pred, point.label), point.features)
>       }
>
>       timer.start(s"building tree $m")
>       logDebug("###################################################")
>       logDebug("Gradient boosting tree iteration " + m)
>       logDebug("###################################################")
>       val dt = new DecisionTreeRegressor().setSeed(seed + m)
>       val model = dt.train(data, treeStrategy)
>
>
>
>
>
> --
> View this message in context: http://apache-spark-
> developers-list.1001551.n3.nabble.com/Reduce-the-memory-
> usage-if-we-do-same-first-in-GradientBoostedTrees-if-
> subsamplingRate-1-0-tp19826.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [email protected]
>
>

Re: Reduce the memory usage if we do same first in GradientBoostedTrees if subsamplingRate< 1.0

Reply via email to