[jira] [Created] (SPARK-43513) withColumnRenamed duplicate columns if new column already exist

Frederik Paradis (Jira) Mon, 15 May 2023 13:59:06 -0700

Frederik Paradis created SPARK-43513:
----------------------------------------


             Summary: withColumnRenamed duplicate columns if new column already 
exist
                 Key: SPARK-43513
                 URL: https://issues.apache.org/jira/browse/SPARK-43513
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 3.4.0
            Reporter: Frederik Paradis


withColumnRenamed should either replace the column when new column already 
exists or should specify the specificity in the documentation. See the code 
below as an example of the current state.

{code:python}
from pyspark.sql import SparkSession

spark = 
SparkSession.builder.master("local[1]").appName("local-spark-session").getOrCreate()

df = spark.createDataFrame([(1, 0.5, 0.4), (2, 0.5, 0.8)], ["id", "score", 
"test_score"])
r = df.withColumnRenamed("test_score", "score")
print(r)  # DataFrame[id: bigint, score: double, score: double]

# pyspark.sql.utils.AnalysisException: Reference 'score' is ambiguous, could 
be: score, score.
print(r.select("score"))
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-43513) withColumnRenamed duplicate columns if new column already exist

Reply via email to