[ 
https://issues.apache.org/jira/browse/SPARK-12235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15219890#comment-15219890
 ] 

MikoĊ‚aj Hnatiuk commented on SPARK-12235:
-----------------------------------------

I wouldn't recommend doing this. This would break functional approach that 
Spark and Scala encourage. You need to construct a new dataframe every time you 
modify something. This way it:
1) can be easily done in parallel
2) " calling a function f twice with the same value for an argument x will 
produce the same result f(x) each time." 
(https://en.wikipedia.org/wiki/Functional_programming)
3) a great guide to FP in R 
http://www.datajujitsu.co.uk/blog/2013/05/16/functional-programming-in-r/ for 
those who are a little bit lost:)

> Enhance mutate() to support replace existing columns
> ----------------------------------------------------
>
>                 Key: SPARK-12235
>                 URL: https://issues.apache.org/jira/browse/SPARK-12235
>             Project: Spark
>          Issue Type: Improvement
>          Components: SparkR
>    Affects Versions: 1.5.2
>            Reporter: Sun Rui
>
> mutate() in the dplyr package supports adding new columns and replacing 
> existing columns. But currently the implementation of mutate() in SparkR 
> supports adding new columns only.
> Also make the behavior of mutate more consistent with that in dplyr.
> 1. Throw error message when there are duplicated column names in the 
> DataFrame being mutated.
> 2. when there are duplicated column names in specified columns by arguments, 
> the last column of the same name takes effect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to