when we train the mode, we will use the data with a subSampleRate, so if the
subSampleRate < 1.0 , we can do a sample first to reduce the memory usage.
se the code below in GradientBoostedTrees.boost()

 while (m < numIterations && !doneLearning) {
      // Update data with pseudo-residuals 剩余误差
      val data = predError.zip(input).map { case ((pred, _), point) =>
        LabeledPoint(-loss.gradient(pred, point.label), point.features)
      }

      timer.start(s"building tree $m")
      logDebug("###################################################")
      logDebug("Gradient boosting tree iteration " + m)
      logDebug("###################################################")
      val dt = new DecisionTreeRegressor().setSeed(seed + m)
      val model = dt.train(data, treeStrategy)
   




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Reduce-the-memory-usage-if-we-do-same-first-in-GradientBoostedTrees-if-subsamplingRate-1-0-tp19826.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to