[
https://issues.apache.org/jira/browse/SPARK-16464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15370412#comment-15370412
]
Dongjoon Hyun commented on SPARK-16464:
---------------------------------------
I see what is your point.
You mean the behavior that `withColumn` is allowed overriding the existing
name.
> withColumn() allows illegal creation of duplicate column names on DataFrame
> ---------------------------------------------------------------------------
>
> Key: SPARK-16464
> URL: https://issues.apache.org/jira/browse/SPARK-16464
> Project: Spark
> Issue Type: Bug
> Components: SparkR, SQL
> Affects Versions: 1.6.1
> Environment: Databricks.com
> Reporter: Neil Dewar
> Priority: Minor
>
> If I take an existing DataFrame, I am permitted to use withColumn() to create
> a duplicate column name. I assume this should be illegal, and withColumn
> should be prevented from permitting this. Some functions subsequently fail
> due to the duplicate column names. Example:
> sdfCar <- createDataFrame(sqlContext, mtcars)
> sdfCar1 <- withColumn(sdfCar, "isEfficient", sdfCar$mpg<=20)
> sdfCar1 <- withColumn(sdfCar1, "isEfficient", ifelse(sdfCar1$mpg ==
> sdfCar1$mpg,1,0))
> sdfCar2 <- subset(sdfCar1, select=sdfCar1$isEfficient)
> # subset() command fails with message: "Reference 'isEfficient' is ambiguous"
> Note: I only know if this is SparkR - it might affect other languages APIs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]