[
https://issues.apache.org/jira/browse/ARROW-16701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Neal Richardson updated ARROW-16701:
------------------------------------
Description:
Now that we have {{to_duckdb()}} and {{to_arrow()}} is it possible to wrap
those and allow someone to insert arbitrary SQL into a dplyr query?
Something like:
{code:r}
sql <- function(data, sql) {
tbl <- to_duckdb(data)
res <- DBI::dbSendQuery(dbplyr::remote_con(.data), sql, arrow = TRUE)
duckdb::duckdb_fetch_record_batch(res)
}
ds %>%
filter(year > 2020) %>%
sql("SELECT tip_amount, fare_amount, total_amount FROM ") %>%
compute()
{code}
This won't work totally, but is vaguely what we're looking for.
One part that we need to think about is how to deal with the {{from}} clause, a
few possibilities:
* ibis does this by making you "name" the table before doing sql so you can
FROM explicitly
* though maybe you could get away with FROM . like it is a magrittr thing and
sub that
* empty string, and we add it in based on the lazy_tbl object
Possibly related prior art:
https://dbplyr.tidyverse.org/reference/build_sql.html (though the name isn't
perfect IMO, and I think this is more geared towards package developers than
end users?)
was:
Now that we have {{to_duckdb()}} and {{to_arrow()}} is it possible to wrap
those and allow someone to insert arbitrary SQL into a dplyr query?
Something like:
{code:r}
sql <- function(data, sql) {
tbl <- to_duckdb(data)
res <- DBI::dbSendQuery(dbplyr::remote_con(.data), sql, arrow = TRUE)
duckdb::duckdb_fetch_record_batch(res)
}
ds %>%
filter(year > 2020) %>%
sql("SELECT tip_amount, fare_amount, total_amount FROM ") %>%
compute()
{code}
This won't work totally, but is vaguely what we're looking for.
One part that we need to think about is how to deal with the {{from}} clause, a
few possibilities:
* bis does this by making you "name" the table before doing sql so you can FROM
explicitly
* though maybe you could get away with FROM . like it is a magrittr thing and
sub that
* empty string, and we add it in based on the lazy_tbl object
Possibly related prior art:
https://dbplyr.tidyverse.org/reference/build_sql.html (though the name isn't
perfect IMO, and I think this is more geared towards package developers than
end users?)
> [R] Can we execute SQL in a dplyr pipeline?
> -------------------------------------------
>
> Key: ARROW-16701
> URL: https://issues.apache.org/jira/browse/ARROW-16701
> Project: Apache Arrow
> Issue Type: New Feature
> Components: R
> Reporter: Jonathan Keane
> Priority: Major
>
> Now that we have {{to_duckdb()}} and {{to_arrow()}} is it possible to wrap
> those and allow someone to insert arbitrary SQL into a dplyr query?
> Something like:
> {code:r}
> sql <- function(data, sql) {
> tbl <- to_duckdb(data)
> res <- DBI::dbSendQuery(dbplyr::remote_con(.data), sql, arrow = TRUE)
> duckdb::duckdb_fetch_record_batch(res)
> }
> ds %>%
> filter(year > 2020) %>%
> sql("SELECT tip_amount, fare_amount, total_amount FROM ") %>%
> compute()
> {code}
> This won't work totally, but is vaguely what we're looking for.
> One part that we need to think about is how to deal with the {{from}} clause,
> a few possibilities:
> * ibis does this by making you "name" the table before doing sql so you can
> FROM explicitly
> * though maybe you could get away with FROM . like it is a magrittr thing and
> sub that
> * empty string, and we add it in based on the lazy_tbl object
> Possibly related prior art:
> https://dbplyr.tidyverse.org/reference/build_sql.html (though the name isn't
> perfect IMO, and I think this is more geared towards package developers than
> end users?)
--
This message was sent by Atlassian Jira
(v8.20.7#820007)