[
https://issues.apache.org/jira/browse/SPARK-7499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595171#comment-14595171
]
Ben Sully commented on SPARK-7499:
----------------------------------
I just had a quick go and it seems quite possibly:
```
name_to_col <- function(colname, data) {
eval(parse(text=paste0(substitute(data), "$", colname)))
}
```
The colname and then column could be extracted using lazyeval::all_dots like
this:
```
select_.DataFrame <- function(.data, ..., .dots) {
dots <- lazyeval::all_dots(..., .dots)
colnames <- sapply(dots, function(x) x$expr)
columns <- sapply(colnames, name_to_col, data = .data)
do_something_with_columns()
}
select(df, name, age)
```
The method I've used is quite primitive, in that it just uses `substitute` to
find out the name of the DataFrame object being passed, then uses `eval` to
resolve the {tablename}${column} into the actual object which then gets passed
to SparkR. It feels quite hacky though, by relying on the `$` method of
DataFrames.
This is just adding a method to dplyr which allows its select, mutate, filter,
group_by and summarise functions to work with DataFrames. I think the part
which allows it to work without strings or "df$age" is the lazyeval package.
> Investigate how to specify columns in SparkR without $ or strings
> -----------------------------------------------------------------
>
> Key: SPARK-7499
> URL: https://issues.apache.org/jira/browse/SPARK-7499
> Project: Spark
> Issue Type: Improvement
> Components: SparkR
> Reporter: Shivaram Venkataraman
>
> Right now in SparkR we need to specify the columns used using `$` or strings.
> For example to run select we would do
> {code}
> df1 <- select(df, df$age > 10)
> {code}
> It would be good to infer the set of columns in a dataframe automatically and
> resolve symbols for column names. For example
> {code}
> df1 <- select(df, age > 10)
> {code}
> One way to do this is to build an environment with all the column names to
> column handles and then use `substitute(arg, env = columnNameEnv)`
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]