[
https://issues.apache.org/jira/browse/SPARK-12235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15219890#comment-15219890
]
MikoĊaj Hnatiuk commented on SPARK-12235:
-----------------------------------------
I wouldn't recommend doing this. This would break functional approach that
Spark and Scala encourage. You need to construct a new dataframe every time you
modify something. This way it:
1) can be easily done in parallel
2) " calling a function f twice with the same value for an argument x will
produce the same result f(x) each time."
(https://en.wikipedia.org/wiki/Functional_programming)
3) a great guide to FP in R
http://www.datajujitsu.co.uk/blog/2013/05/16/functional-programming-in-r/ for
those who are a little bit lost:)
> Enhance mutate() to support replace existing columns
> ----------------------------------------------------
>
> Key: SPARK-12235
> URL: https://issues.apache.org/jira/browse/SPARK-12235
> Project: Spark
> Issue Type: Improvement
> Components: SparkR
> Affects Versions: 1.5.2
> Reporter: Sun Rui
>
> mutate() in the dplyr package supports adding new columns and replacing
> existing columns. But currently the implementation of mutate() in SparkR
> supports adding new columns only.
> Also make the behavior of mutate more consistent with that in dplyr.
> 1. Throw error message when there are duplicated column names in the
> DataFrame being mutated.
> 2. when there are duplicated column names in specified columns by arguments,
> the last column of the same name takes effect.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]