[ 
https://issues.apache.org/jira/browse/SPARK-24853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459601#comment-17459601
 ] 

Nicholas Chammas commented on SPARK-24853:
------------------------------------------

Assuming we are talking about the example I provided: Yes, {{col("count")}} 
would still be ambiguous.

I don't know if Spark would know to catch that problem. But note that the 
current behavior of {{.withColumnRenamed('count', ...)}} renames all columns 
named "count", which is just incorrect.

So allowing {{col("count")}} will either be just as incorrect as the current 
behavior, or it will be an improvement in that Spark may complain that the 
column reference is ambiguous. I'd have to try it to confirm the behavior.

Of course, the main improvement offered by {{Column}} references is that users 
can do something like {{.withColumnRenamed(left_counts['count'], ...)}} and get 
the correct behavior.

I didn't follow what you are getting at regarding {{{}from_json{}}}, but does 
that address your concern?

> Support Column type for withColumn and withColumnRenamed apis
> -------------------------------------------------------------
>
>                 Key: SPARK-24853
>                 URL: https://issues.apache.org/jira/browse/SPARK-24853
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.2.2, 3.2.0
>            Reporter: nirav patel
>            Priority: Minor
>
> Can we add overloaded version of withColumn or withColumnRenamed that accept 
> Column type instead of String? That way I can specify FQN in case when there 
> is duplicate column names. e.g. if I have 2 columns with same name as a 
> result of join and I want to rename one of the field I can do it with this 
> new API.
>  
> This would be similar to Drop api which supports both String and Column type.
>  
> def
> withColumn(colName: Column, col: Column): DataFrame
> Returns a new Dataset by adding a column or replacing the existing column 
> that has the same name.
>  
> def
> withColumnRenamed(existingName: Column, newName: Column): DataFrame
> Returns a new Dataset with a column renamed.
>  
>  
>  
> I think there should also be this one:
>  
> def
> withColumnRenamed(existingName: *Column*, newName: *Column*): DataFrame
> Returns a new Dataset with a column renamed.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to