回复： Reduce the memory usage if we do same first inGradientBoostedTrees if subsamplingRate< 1.0

WangJianfei Tue, 15 Nov 2016 17:03:12 -0800

with predError.zip(input) ,we get RDD data,  so we can just do a sample on 
predError or input, if so, we can't use zip(the elements number must be the 
same in each partition),thank you!





------------------ 原始邮件 ------------------
发件人: "Joseph Bradley [via Apache Spark Developers 
List]";<ml-node+s1001551n19899...@n3.nabble.com>;
发送时间: 2016年11月16日(星期三) 凌晨3:54
收件人: "WangJianfei"<wangjianfe...@otcaix.iscas.ac.cn>; 

主题: Re: Reduce the memory usage if we do same first inGradientBoostedTrees if 
subsamplingRate< 1.0



        Thanks for the suggestion.  That would be faster, but less accurate in 
most cases.  It's generally better to use a new random sample on each 
iteration, based on literature and results I've seen.Joseph


On Fri, Nov 11, 2016 at 5:13 AM, WangJianfei <[hidden email]> wrote:
when we train the mode, we will use the data with a subSampleRate, so if the
 subSampleRate < 1.0 , we can do a sample first to reduce the memory usage.
 se the code below in GradientBoostedTrees.boost()
 
  while (m < numIterations && !doneLearning) {
       // Update data with pseudo-residuals 剩余误差
       val data = predError.zip(input).map { case ((pred, _), point) =>
         LabeledPoint(-loss.gradient(pred, point.label), point.features)
       }
 
       timer.start(s"building tree $m")
       logDebug("###################################################")
       logDebug("Gradient boosting tree iteration " + m)
       logDebug("###################################################")
       val dt = new DecisionTreeRegressor().setSeed(seed + m)
       val model = dt.train(data, treeStrategy)
 
 
 
 
 
 --
 View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Reduce-the-memory-usage-if-we-do-same-first-in-GradientBoostedTrees-if-subsamplingRate-1-0-tp19826.html
 Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
 
 ---------------------------------------------------------------------
 To unsubscribe e-mail: [hidden email]
 
 


                                
        
        
                        If you reply to this email, your message will be added 
to the discussion below:
                
http://apache-spark-developers-list.1001551.n3.nabble.com/Reduce-the-memory-usage-if-we-do-sample-first-in-GradientBoostedTrees-with-the-condition-that-subsam0-tp19826p19899.html
      
                                        To unsubscribe from Reduce the memory 
usage if we do sample first in GradientBoostedTrees with the condition that 
subsamplingRate< 1.0, click here.
                NAML



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Reduce-the-memory-usage-if-we-do-same-first-inGradientBoostedTrees-if-subsamplingRate-1-0-tp19904.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

回复： Reduce the memory usage if we do same first inGradientBoostedTrees if subsamplingRate< 1.0

Reply via email to