(spark) branch branch-3.4 updated: [MINOR][DOCS] Clarify relation between grouping API and `spark.sql.execution.arrow.maxRecordsPerBatch`

gurwls223 Wed, 10 Apr 2024 19:26:32 -0700

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.4 by this push:
     new e94bb505b897 [MINOR][DOCS] Clarify relation between grouping API and 
`spark.sql.execution.arrow.maxRecordsPerBatch`
e94bb505b897 is described below

commit e94bb505b897fe5fd8f91fb680a4989cd1fe72fe
Author: Hyukjin Kwon <gurwls...@apache.org>
AuthorDate: Thu Apr 11 11:25:49 2024 +0900

    [MINOR][DOCS] Clarify relation between grouping API and 
`spark.sql.execution.arrow.maxRecordsPerBatch`
    
    ### What changes were proposed in this pull request?
    
    This PR fixes the documentation of 
`spark.sql.execution.arrow.maxRecordsPerBatch` to clarify the relation between 
`spark.sql.execution.arrow.maxRecordsPerBatch` and grouping API such as 
`DataFrame(.cogroup).groupby.applyInPandas`.
    
    ### Why are the changes needed?
    
    To address confusion about them.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, it fixes the user-facing SQL configuration page 
https://spark.apache.org/docs/latest/configuration.html#runtime-sql-configuration
    
    ### How was this patch tested?
    
    CI in this PR should verify them. I ran linters.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #45993 from HyukjinKwon/minor-doc-change.
    
    Authored-by: Hyukjin Kwon <gurwls...@apache.org>
    Signed-off-by: Hyukjin Kwon <gurwls...@apache.org>
    (cherry picked from commit 6c8e4cfd6f3f95455b0d4479f2527d425349f1cf)
    Signed-off-by: Hyukjin Kwon <gurwls...@apache.org>
---
 .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala        | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 951a54a15cbc..be9a7c82828e 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -2800,7 +2800,9 @@ object SQLConf {
   val ARROW_EXECUTION_MAX_RECORDS_PER_BATCH =
     buildConf("spark.sql.execution.arrow.maxRecordsPerBatch")
       .doc("When using Apache Arrow, limit the maximum number of records that 
can be written " +
-        "to a single ArrowRecordBatch in memory. If set to zero or negative 
there is no limit.")
+        "to a single ArrowRecordBatch in memory. This configuration is not 
effective for the " +
+        "grouping API such as DataFrame(.cogroup).groupby.applyInPandas 
because each group " +
+        "becomes each ArrowRecordBatch. If set to zero or negative there is no 
limit.")
       .version("2.3.0")
       .intConf
       .createWithDefault(10000)


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.4 updated: [MINOR][DOCS] Clarify relation between grouping API and `spark.sql.execution.arrow.maxRecordsPerBatch`

Reply via email to