[ https://issues.apache.org/jira/browse/SPARK-24853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459601#comment-17459601 ]
Nicholas Chammas commented on SPARK-24853: ------------------------------------------ Assuming we are talking about the example I provided: Yes, {{col("count")}} would still be ambiguous. I don't know if Spark would know to catch that problem. But note that the current behavior of {{.withColumnRenamed('count', ...)}} renames all columns named "count", which is just incorrect. So allowing {{col("count")}} will either be just as incorrect as the current behavior, or it will be an improvement in that Spark may complain that the column reference is ambiguous. I'd have to try it to confirm the behavior. Of course, the main improvement offered by {{Column}} references is that users can do something like {{.withColumnRenamed(left_counts['count'], ...)}} and get the correct behavior. I didn't follow what you are getting at regarding {{{}from_json{}}}, but does that address your concern? > Support Column type for withColumn and withColumnRenamed apis > ------------------------------------------------------------- > > Key: SPARK-24853 > URL: https://issues.apache.org/jira/browse/SPARK-24853 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.2.2, 3.2.0 > Reporter: nirav patel > Priority: Minor > > Can we add overloaded version of withColumn or withColumnRenamed that accept > Column type instead of String? That way I can specify FQN in case when there > is duplicate column names. e.g. if I have 2 columns with same name as a > result of join and I want to rename one of the field I can do it with this > new API. > > This would be similar to Drop api which supports both String and Column type. > > def > withColumn(colName: Column, col: Column): DataFrame > Returns a new Dataset by adding a column or replacing the existing column > that has the same name. > > def > withColumnRenamed(existingName: Column, newName: Column): DataFrame > Returns a new Dataset with a column renamed. > > > > I think there should also be this one: > > def > withColumnRenamed(existingName: *Column*, newName: *Column*): DataFrame > Returns a new Dataset with a column renamed. > -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org