Thanks for the suggestion.  That would be faster, but less accurate in most
cases.  It's generally better to use a new random sample on each iteration,
based on literature and results I've seen.
Joseph

On Fri, Nov 11, 2016 at 5:13 AM, WangJianfei <
wangjianfe...@otcaix.iscas.ac.cn> wrote:

> when we train the mode, we will use the data with a subSampleRate, so if
> the
> subSampleRate < 1.0 , we can do a sample first to reduce the memory usage.
> se the code below in GradientBoostedTrees.boost()
>
>  while (m < numIterations && !doneLearning) {
>       // Update data with pseudo-residuals 剩余误差
>       val data = predError.zip(input).map { case ((pred, _), point) =>
>         LabeledPoint(-loss.gradient(pred, point.label), point.features)
>       }
>
>       timer.start(s"building tree $m")
>       logDebug("###################################################")
>       logDebug("Gradient boosting tree iteration " + m)
>       logDebug("###################################################")
>       val dt = new DecisionTreeRegressor().setSeed(seed + m)
>       val model = dt.train(data, treeStrategy)
>
>
>
>
>
> --
> View this message in context: http://apache-spark-
> developers-list.1001551.n3.nabble.com/Reduce-the-memory-
> usage-if-we-do-same-first-in-GradientBoostedTrees-if-
> subsamplingRate-1-0-tp19826.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Reply via email to