with predError.zip(input) ,we get RDD data, so we can just do a sample on predError or input, if so, we can't use zip(the elements number must be the same in each partition),thank you!
-- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Reduce-the-memory-usage-if-we-do-sample-first-in-GradientBoostedTrees-with-the-condition-that-subsam0-tp19826p19905.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org