(spark) branch master updated: [MINOR][DOCS] Clarify relation between grouping API and `spark.sql.execution.arrow.maxRecordsPerBatch`

gurwls223 Wed, 10 Apr 2024 19:26:05 -0700

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 6c8e4cfd6f3f [MINOR][DOCS] Clarify relation between grouping API and 
`spark.sql.execution.arrow.maxRecordsPerBatch`
6c8e4cfd6f3f is described below

commit 6c8e4cfd6f3f95455b0d4479f2527d425349f1cf
Author: Hyukjin Kwon <[email protected]>
AuthorDate: Thu Apr 11 11:25:49 2024 +0900

    [MINOR][DOCS] Clarify relation between grouping API and 
`spark.sql.execution.arrow.maxRecordsPerBatch`
    
    ### What changes were proposed in this pull request?
    
    This PR fixes the documentation of 
`spark.sql.execution.arrow.maxRecordsPerBatch` to clarify the relation between 
`spark.sql.execution.arrow.maxRecordsPerBatch` and grouping API such as 
`DataFrame(.cogroup).groupby.applyInPandas`.
    
    ### Why are the changes needed?
    
    To address confusion about them.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, it fixes the user-facing SQL configuration page 
https://spark.apache.org/docs/latest/configuration.html#runtime-sql-configuration
    
    ### How was this patch tested?
    
    CI in this PR should verify them. I ran linters.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #45993 from HyukjinKwon/minor-doc-change.
    
    Authored-by: Hyukjin Kwon <[email protected]>
    Signed-off-by: Hyukjin Kwon <[email protected]>
---
 .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala        | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 27fba0b19f48..55d8b61f8b94 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -3016,7 +3016,9 @@ object SQLConf {
   val ARROW_EXECUTION_MAX_RECORDS_PER_BATCH =
     buildConf("spark.sql.execution.arrow.maxRecordsPerBatch")
       .doc("When using Apache Arrow, limit the maximum number of records that 
can be written " +
-        "to a single ArrowRecordBatch in memory. If set to zero or negative 
there is no limit.")
+        "to a single ArrowRecordBatch in memory. This configuration is not 
effective for the " +
+        "grouping API such as DataFrame(.cogroup).groupby.applyInPandas 
because each group " +
+        "becomes each ArrowRecordBatch. If set to zero or negative there is no 
limit.")
       .version("2.3.0")
       .intConf
       .createWithDefault(10000)


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [MINOR][DOCS] Clarify relation between grouping API and `spark.sql.execution.arrow.maxRecordsPerBatch`

Reply via email to