[
https://issues.apache.org/jira/browse/SPARK-43513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Frederik Paradis updated SPARK-43513:
-------------------------------------
Summary: withColumnRenamed duplicates columns if new column already exists
(was: withColumnRenamed duplicates columns if new column already exist)
> withColumnRenamed duplicates columns if new column already exists
> -----------------------------------------------------------------
>
> Key: SPARK-43513
> URL: https://issues.apache.org/jira/browse/SPARK-43513
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 3.4.0
> Reporter: Frederik Paradis
> Priority: Major
>
> withColumnRenamed should either replace the column when new column already
> exists or should specify the specificity in the documentation. See the code
> below as an example of the current state.
> {code:python}
> from pyspark.sql import SparkSession
> spark =
> SparkSession.builder.master("local[1]").appName("local-spark-session").getOrCreate()
> df = spark.createDataFrame([(1, 0.5, 0.4), (2, 0.5, 0.8)], ["id", "score",
> "test_score"])
> r = df.withColumnRenamed("test_score", "score")
> print(r) # DataFrame[id: bigint, score: double, score: double]
> # pyspark.sql.utils.AnalysisException: Reference 'score' is ambiguous, could
> be: score, score.
> print(r.select("score"))
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]