[GitHub] [arrow] jonkeane opened a new pull request #10780: ARROW-12688: [R] Use DuckDB to query an Arrow Dataset

GitBox Thu, 22 Jul 2021 08:20:18 -0700


jonkeane opened a new pull request #10780:
URL: https://github.com/apache/arrow/pull/10780



   A proposed interface for using DuckDB + Arrow together.
   
   I've added two methods:
     * The proposed `summarise(..., .engine = "duckdb")` method which is 
(probably) the method that people want to use
     * A lower-level method of specifying exactly when the transfer takes 
place. I've called this `alchemize_*` for now, though we might consider wedging 
it into `collect()` or `compute()` (or something like `collect_to_duckdb()` to 
be super explicit[1]).
     * I've made a proof-of-concept that the `alchemize_*` can also work with 
Python — this is basically a renaming/wrapping of `r_to_py` / `py_to_r`. If we 
do peruse exposing `alchemize_*` or the like, I will fill out the rest of these 
(we should keep both around, though r_to_py isn't currently documented so 
probably isn't getting much use).
     
     [1] I've tried both a more magical `alchemize(x, to = c("arrow", "duckdb", 
"python))` the changes behavior / output based on the `to` argument, which we 
can go back to if we want that simplicity, but I found it harder to reason 
about what I was getting out. Where as with `alchemize_to_duckdb()` the 
function says exactly what's going on.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jonkeane opened a new pull request #10780: ARROW-12688: [R] Use DuckDB to query an Arrow Dataset

Reply via email to