(spark) branch branch-4.1 updated: [SPARK-54234][PYTHON][CONNECT] Not need to attach PlanId in grouping column names in df.groupBy

wenchen Fri, 07 Nov 2025 08:17:37 -0800

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-4.1
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-4.1 by this push:
     new 5b372d8ab214 [SPARK-54234][PYTHON][CONNECT] Not need to attach PlanId 
in grouping column names in df.groupBy
5b372d8ab214 is described below

commit 5b372d8ab2146f329e2956829821dfdde9ce3d9d
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Fri Nov 7 08:15:56 2025 -0800

    [SPARK-54234][PYTHON][CONNECT] Not need to attach PlanId in grouping column 
names in df.groupBy
    
    ### What changes were proposed in this pull request?
    Not need to attach PlanId in grouping column names in df.groupBy
    
    ### Why are the changes needed?
    to be more consistent with classic mode
    
    
https://github.com/apache/spark/blob/e75ca577923f9f465eb06b4df814c00143fa41ea/sql/api/src/main/scala/org/apache/spark/sql/Dataset.scala#L1318-L1320
    
    ### Does this PR introduce _any_ user-facing change?
    no
    
    ### How was this patch tested?
    ci
    
    ### Was this patch authored or co-authored using generative AI tooling?
    no
    
    Closes #52933 from zhengruifeng/connect_group_key_relax.
    
    Authored-by: Ruifeng Zheng <[email protected]>
    Signed-off-by: Wenchen Fan <[email protected]>
    (cherry picked from commit 37997cef6dfff7a0c093f75e720d6792e1eafefc)
    Signed-off-by: Wenchen Fan <[email protected]>
---
 python/pyspark/sql/connect/dataframe.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python/pyspark/sql/connect/dataframe.py 
b/python/pyspark/sql/connect/dataframe.py
index 71a499afd2ff..862974f11165 100644
--- a/python/pyspark/sql/connect/dataframe.py
+++ b/python/pyspark/sql/connect/dataframe.py
@@ -587,7 +587,7 @@ class DataFrame(ParentDataFrame):
             if isinstance(c, Column):
                 _cols.append(c)
             elif isinstance(c, str):
-                _cols.append(self[c])
+                _cols.append(F.col(c))
             elif isinstance(c, int) and not isinstance(c, bool):
                 if c < 1:
                     raise PySparkIndexError(


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch branch-4.1 updated: [SPARK-54234][PYTHON][CONNECT] Not need to attach PlanId in grouping column names in df.groupBy

Reply via email to