This is an automated email from the ASF dual-hosted git repository.
kabhwan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new b969ffe5d816 [SPARK-50752][SQL][PYSPARK][FOLLOWUP] Respect
PYTHON_UDF_MAX_RECORDS_PER_BATCH in Python worker
b969ffe5d816 is described below
commit b969ffe5d81616151efad0bfef2a345aa51f988b
Author: Jungtaek Lim <[email protected]>
AuthorDate: Mon Jun 9 14:37:14 2025 +0900
[SPARK-50752][SQL][PYSPARK][FOLLOWUP] Respect
PYTHON_UDF_MAX_RECORDS_PER_BATCH in Python worker
### What changes were proposed in this pull request?
This PR is a follow-up of SPARK-50752 (PR #49397), which introduced
PYTHON_UDF_MAX_RECORDS_PER_BATCH into SQLConf to control the batch size of
normal Python UDF.
This PR is to fix the config to be effective for Python worker side as well.
### Why are the changes needed?
The original PR enabled the control of batch size for JVM side, but it
wasn't properly propagated to Python worker since we missed to override the
value for Python UDF. (Default value before overriding is a static one, 100)
### Does this PR introduce _any_ user-facing change?
Yes, but it's about tuning and does not impact any change on the output.
### How was this patch tested?
No automated test, since there is no good way to test this properly since
missing this does not change the output itself.
This was manually tested with artificial debug logging.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #51124 from
HeartSaVioR/SPARK-50752-FOLLOWUP-respect-PYTHON_UDF_MAX_RECORDS_PER_BATCH-in-python-worker.
Authored-by: Jungtaek Lim <[email protected]>
Signed-off-by: Jungtaek Lim <[email protected]>
---
.../scala/org/apache/spark/sql/execution/python/PythonUDFRunner.scala | 2 ++
1 file changed, 2 insertions(+)
diff --git
a/sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonUDFRunner.scala
b/sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonUDFRunner.scala
index 11773f92c482..4baddcd4d9e7 100644
---
a/sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonUDFRunner.scala
+++
b/sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonUDFRunner.scala
@@ -50,6 +50,8 @@ abstract class BasePythonUDFRunner(
override val killOnIdleTimeout: Boolean =
SQLConf.get.pythonUDFWorkerKillOnIdleTimeout
override val bufferSize: Int =
SQLConf.get.getConf(SQLConf.PYTHON_UDF_BUFFER_SIZE)
+ override val batchSizeForPythonUDF: Int =
+ SQLConf.get.getConf(SQLConf.PYTHON_UDF_MAX_RECORDS_PER_BATCH)
protected def writeUDF(dataOut: DataOutputStream): Unit
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]