paleolimbot commented on a change in pull request #11915:
URL: https://github.com/apache/arrow/pull/11915#discussion_r772307168



##########
File path: r/vignettes/developers/bindings.Rmd
##########
@@ -0,0 +1,238 @@
+---
+title: "Writing Bindings"
+---
+
+```{r, include=FALSE}
+library(arrow, warn.conflicts = FALSE)
+library(dplyr, warn.conflicts = FALSE)
+```
+
+
+When writing bindings between C++ compute functions and R functions, the aim 
is 
+to expose the C++ functionality via existing R functions. The syntax and 
+functionality should match that of the existing R functions 
+(though with some exceptions) so that users are able to use existing tidyverse 
+or base R syntax, or call existing S3 methods on objects, whilst taking 
+advantage of the speed and functionality of the underlying arrow package.
+
+# Implementing bindings for S3 generics
+
+If a function is an S3 generic method, you may be able to define a version of 
it for 
+Arrow objects.  There are two base classes which have been defined in the
+R package so that S3 methods don't have to be defined repeatedly for objects 
with
+similar behaviour:
+
+* ArrowTabular - for RecordBatch and Table objects
+* ArrowDatum - for Scalar, Array, and ChunkedArray objects
+
+What this means is that any function defined for the base class will work with 
+the child class.  For example, the function `dim()` may be defined as:
+
+```{r, eval = FALSE}
+dim.ArrowTabular <- function(x) c(x$num_rows, x$num_columns)
+```
+
+This implements `dim()` for both RecordBatch and Table objects.
+
+```{r}
+arrow_table(x = c(1, 2, 3), y = c(4, 5, 6)) %>%
+  dim()
+```
+
+# Implementing bindings to work within dplyr pipelines
+
+One of main ways in which users interact with arrow is via dplyr syntax called 
+on Arrow objects.  For example, when a user calls `dplyr::mutate()` on an 
Arrow Tabular, 
+Dataset, or arrow data query object, the Arrow implementation of `mutate()` is 
+used and under the hood, translates the dplyr code into Arrow C++ code.
+
+When using `dplyr::mutate()` or `dplyr::filter()`, you may want to use 
functions
+from other packages.  The example below uses `stringr::str_detect()`.
+
+```{r}
+library(dplyr)
+library(stringr)
+starwars %>%
+  filter(str_detect(name, "Darth"))
+```
+This functionality has also been implemented in Arrow, e.g.:
+
+```{r}
+library(arrow)
+arrow_table(starwars) %>%
+  filter(str_detect(name, "Darth")) %>%
+  collect()
+```
+
+This is possible as a **binding** has been created between the stringr function

Review comment:
       Sorry I missed this last week! I like how you've rephrased it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to