Frederik Paradis created SPARK-43513:
----------------------------------------
Summary: withColumnRenamed duplicate columns if new column already
exist
Key: SPARK-43513
URL: https://issues.apache.org/jira/browse/SPARK-43513
Project: Spark
Issue Type: Bug
Components: PySpark
Affects Versions: 3.4.0
Reporter: Frederik Paradis
withColumnRenamed should either replace the column when new column already
exists or should specify the specificity in the documentation. See the code
below as an example of the current state.
{code:python}
from pyspark.sql import SparkSession
spark =
SparkSession.builder.master("local[1]").appName("local-spark-session").getOrCreate()
df = spark.createDataFrame([(1, 0.5, 0.4), (2, 0.5, 0.8)], ["id", "score",
"test_score"])
r = df.withColumnRenamed("test_score", "score")
print(r) # DataFrame[id: bigint, score: double, score: double]
# pyspark.sql.utils.AnalysisException: Reference 'score' is ambiguous, could
be: score, score.
print(r.select("score"))
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]