This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.4 by this push: new e94bb505b897 [MINOR][DOCS] Clarify relation between grouping API and `spark.sql.execution.arrow.maxRecordsPerBatch` e94bb505b897 is described below commit e94bb505b897fe5fd8f91fb680a4989cd1fe72fe Author: Hyukjin Kwon <gurwls...@apache.org> AuthorDate: Thu Apr 11 11:25:49 2024 +0900 [MINOR][DOCS] Clarify relation between grouping API and `spark.sql.execution.arrow.maxRecordsPerBatch` ### What changes were proposed in this pull request? This PR fixes the documentation of `spark.sql.execution.arrow.maxRecordsPerBatch` to clarify the relation between `spark.sql.execution.arrow.maxRecordsPerBatch` and grouping API such as `DataFrame(.cogroup).groupby.applyInPandas`. ### Why are the changes needed? To address confusion about them. ### Does this PR introduce _any_ user-facing change? Yes, it fixes the user-facing SQL configuration page https://spark.apache.org/docs/latest/configuration.html#runtime-sql-configuration ### How was this patch tested? CI in this PR should verify them. I ran linters. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45993 from HyukjinKwon/minor-doc-change. Authored-by: Hyukjin Kwon <gurwls...@apache.org> Signed-off-by: Hyukjin Kwon <gurwls...@apache.org> (cherry picked from commit 6c8e4cfd6f3f95455b0d4479f2527d425349f1cf) Signed-off-by: Hyukjin Kwon <gurwls...@apache.org> --- .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 951a54a15cbc..be9a7c82828e 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -2800,7 +2800,9 @@ object SQLConf { val ARROW_EXECUTION_MAX_RECORDS_PER_BATCH = buildConf("spark.sql.execution.arrow.maxRecordsPerBatch") .doc("When using Apache Arrow, limit the maximum number of records that can be written " + - "to a single ArrowRecordBatch in memory. If set to zero or negative there is no limit.") + "to a single ArrowRecordBatch in memory. This configuration is not effective for the " + + "grouping API such as DataFrame(.cogroup).groupby.applyInPandas because each group " + + "becomes each ArrowRecordBatch. If set to zero or negative there is no limit.") .version("2.3.0") .intConf .createWithDefault(10000) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org