This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 865c88f [MINOR][DOC] Add note regarding proper usage of QueryExecution.toRdd 865c88f is described below commit 865c88f9c735b15dd1a0d275533f086665e8abd8 Author: Jungtaek Lim (HeartSaVioR) <kabh...@gmail.com> AuthorDate: Tue Feb 19 09:42:21 2019 +0800 [MINOR][DOC] Add note regarding proper usage of QueryExecution.toRdd ## What changes were proposed in this pull request? This proposes adding a note on `QueryExecution.toRdd` regarding Spark's internal optimization callers would need to indicate. ## How was this patch tested? This patch is a documentation change. Closes #23822 from HeartSaVioR/MINOR-doc-add-note-query-execution-to-rdd. Authored-by: Jungtaek Lim (HeartSaVioR) <kabh...@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls...@apache.org> --- .../scala/org/apache/spark/sql/execution/QueryExecution.scala | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala index 72499aa..49d6acf 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala @@ -85,7 +85,16 @@ class QueryExecution( prepareForExecution(sparkPlan) } - /** Internal version of the RDD. Avoids copies and has no schema */ + /** + * Internal version of the RDD. Avoids copies and has no schema. + * Note for callers: Spark may apply various optimization including reusing object: this means + * the row is valid only for the iteration it is retrieved. You should avoid storing row and + * accessing after iteration. (Calling `collect()` is one of known bad usage.) + * If you want to store these rows into collection, please apply some converter or copy row + * which produces new object per iteration. + * Given QueryExecution is not a public class, end users are discouraged to use this: please + * use `Dataset.rdd` instead where conversion will be applied. + */ lazy val toRdd: RDD[InternalRow] = executedPlan.execute() /** --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org