(spark) branch master updated: [SPARK-53563][PS] Optimize: sql_processor by avoiding inefficient string concatenation

gurwls223 Sun, 14 Sep 2025 16:30:14 -0700

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new e4d60e9a9778 [SPARK-53563][PS] Optimize: sql_processor by avoiding 
inefficient string concatenation
e4d60e9a9778 is described below

commit e4d60e9a977896432feed490a6944763b70d91eb
Author: Peter Nguyen <petern0...@gmail.com>
AuthorDate: Mon Sep 15 07:51:12 2025 +0900

    [SPARK-53563][PS] Optimize: sql_processor by avoiding inefficient string 
concatenation
    
    ### What changes were proposed in this pull request?
    
    Improves performance in `sql_processor` by building a string in a list to 
avoid repeatedly concatenating immutable Python strings. Addresses a "todo" 
comment I stumbled across in the code.
    
    ### Why are the changes needed?
    
    Performance improvement
    
    ### Does this PR introduce _any_ user-facing change?
    
    No
    
    ### How was this patch tested?
    
    Passes existing tests
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No
    
    Closes #52322 from petern48/optim_string_builder.
    
    Authored-by: Peter Nguyen <petern0...@gmail.com>
    Signed-off-by: Hyukjin Kwon <gurwls...@apache.org>
---
 python/pyspark/pandas/sql_processor.py | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/python/pyspark/pandas/sql_processor.py 
b/python/pyspark/pandas/sql_processor.py
index e24c369cd43f..8437ff6e48cf 100644
--- a/python/pyspark/pandas/sql_processor.py
+++ b/python/pyspark/pandas/sql_processor.py
@@ -293,13 +293,12 @@ class SQLProcessor:
         0   True  False
         """
         blocks = _string.formatter_parser(self._statement)
-        # TODO: use a string builder
-        res = ""
+        res = []
         try:
             for pre, inner, _, _ in blocks:
                 var_next = "" if inner is None else self._convert(inner)
-                res = res + pre + var_next
-            self._normalized_statement = res
+                res.append(pre + var_next)
+            self._normalized_statement = "".join(res)
 
             sdf = self._session.sql(self._normalized_statement)
         finally:


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-53563][PS] Optimize: sql_processor by avoiding inefficient string concatenation

Reply via email to