[
https://issues.apache.org/jira/browse/SPARK-24853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437394#comment-17437394
]
Nicholas Chammas edited comment on SPARK-24853 at 11/2/21, 2:41 PM:
--------------------------------------------------------------------
[~hyukjin.kwon] - It's not just for consistency. As noted in the description,
this is useful when you are trying to rename a column with an ambiguous name.
For example, imagine two tables {{left}} and {{right}}, each with a column
called {{count}}:
{code:python}
(
left_counts.alias('left')
.join(right_counts.alias('right'), on='join_key')
.withColumn(
'total_count',
left_counts['count'] + right_counts['count']
)
.withColumnRenamed('left.count', 'left_count') # no-op; alias doesn't work
.withColumnRenamed('count', 'left_count') # incorrect; it renames both count
columns
.withColumnRenamed(left_counts['count'], 'left_count') # what, ideally,
users want to do here
.show()
){code}
If you don't mind, I'm going to reopen this issue.
was (Author: nchammas):
[~hyukjin.kwon] - It's not just for consistency. As noted in the description,
this is useful when you are trying to rename a column with an ambiguous name.
For example, imagine two tables {{left}} and {{right}}, each with a column
called {{count}}:
{code:java}
(
left_counts.alias('left')
.join(right_counts.alias('right'), on='join_key')
.withColumn(
'total_count',
left_counts['count'] + right_counts['count']
)
.withColumnRenamed('left.count', 'left_count') # no-op; alias doesn't work
.withColumnRenamed('count', 'left_count') # incorrect; it renames both count
columns
.withColumnRenamed(left_counts['count'], 'left_count') # what, ideally,
users want to do here
.show()
){code}
If you don't mind, I'm going to reopen this issue.
> Support Column type for withColumn and withColumnRenamed apis
> -------------------------------------------------------------
>
> Key: SPARK-24853
> URL: https://issues.apache.org/jira/browse/SPARK-24853
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.2.2, 3.2.0
> Reporter: nirav patel
> Priority: Minor
>
> Can we add overloaded version of withColumn or withColumnRenamed that accept
> Column type instead of String? That way I can specify FQN in case when there
> is duplicate column names. e.g. if I have 2 columns with same name as a
> result of join and I want to rename one of the field I can do it with this
> new API.
>
> This would be similar to Drop api which supports both String and Column type.
>
> def
> withColumn(colName: Column, col: Column): DataFrame
> Returns a new Dataset by adding a column or replacing the existing column
> that has the same name.
>
> def
> withColumnRenamed(existingName: Column, newName: Column): DataFrame
> Returns a new Dataset with a column renamed.
>
>
>
> I think there should also be this one:
>
> def
> withColumnRenamed(existingName: *Column*, newName: *Column*): DataFrame
> Returns a new Dataset with a column renamed.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]