This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 6c8e4cfd6f3f [MINOR][DOCS] Clarify relation between grouping API and
`spark.sql.execution.arrow.maxRecordsPerBatch`
6c8e4cfd6f3f is described below
commit 6c8e4cfd6f3f95455b0d4479f2527d425349f1cf
Author: Hyukjin Kwon <[email protected]>
AuthorDate: Thu Apr 11 11:25:49 2024 +0900
[MINOR][DOCS] Clarify relation between grouping API and
`spark.sql.execution.arrow.maxRecordsPerBatch`
### What changes were proposed in this pull request?
This PR fixes the documentation of
`spark.sql.execution.arrow.maxRecordsPerBatch` to clarify the relation between
`spark.sql.execution.arrow.maxRecordsPerBatch` and grouping API such as
`DataFrame(.cogroup).groupby.applyInPandas`.
### Why are the changes needed?
To address confusion about them.
### Does this PR introduce _any_ user-facing change?
Yes, it fixes the user-facing SQL configuration page
https://spark.apache.org/docs/latest/configuration.html#runtime-sql-configuration
### How was this patch tested?
CI in this PR should verify them. I ran linters.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #45993 from HyukjinKwon/minor-doc-change.
Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
---
.../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 27fba0b19f48..55d8b61f8b94 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -3016,7 +3016,9 @@ object SQLConf {
val ARROW_EXECUTION_MAX_RECORDS_PER_BATCH =
buildConf("spark.sql.execution.arrow.maxRecordsPerBatch")
.doc("When using Apache Arrow, limit the maximum number of records that
can be written " +
- "to a single ArrowRecordBatch in memory. If set to zero or negative
there is no limit.")
+ "to a single ArrowRecordBatch in memory. This configuration is not
effective for the " +
+ "grouping API such as DataFrame(.cogroup).groupby.applyInPandas
because each group " +
+ "becomes each ArrowRecordBatch. If set to zero or negative there is no
limit.")
.version("2.3.0")
.intConf
.createWithDefault(10000)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]