thisisnic commented on a change in pull request #11915:
URL: https://github.com/apache/arrow/pull/11915#discussion_r771365295



##########
File path: r/vignettes/developers/bindings.Rmd
##########
@@ -0,0 +1,238 @@
+---
+title: "Writing Bindings"
+---
+
+```{r, include=FALSE}
+library(arrow, warn.conflicts = FALSE)
+library(dplyr, warn.conflicts = FALSE)
+```
+
+
+When writing bindings between C++ compute functions and R functions, the aim 
is 
+to expose the C++ functionality via existing R functions. The syntax and 
+functionality should match that of the existing R functions 
+(though with some exceptions) so that users are able to use existing tidyverse 
+or base R syntax, or call existing S3 methods on objects, whilst taking 
+advantage of the speed and functionality of the underlying arrow package.
+
+# Implementing bindings for S3 generics
+
+If a function is an S3 generic method, you may be able to define a version of 
it for 
+Arrow objects.  There are two base classes which have been defined in the
+R package so that S3 methods don't have to be defined repeatedly for objects 
with
+similar behaviour:
+
+* ArrowTabular - for RecordBatch and Table objects
+* ArrowDatum - for Scalar, Array, and ChunkedArray objects
+
+What this means is that any function defined for the base class will work with 
+the child class.  For example, the function `dim()` may be defined as:
+
+```{r, eval = FALSE}
+dim.ArrowTabular <- function(x) c(x$num_rows, x$num_columns)
+```
+
+This implements `dim()` for both RecordBatch and Table objects.
+
+```{r}
+arrow_table(x = c(1, 2, 3), y = c(4, 5, 6)) %>%
+  dim()
+```
+
+# Implementing bindings to work within dplyr pipelines
+
+One of main ways in which users interact with arrow is via dplyr syntax called 
+on Arrow objects.  For example, when a user calls `dplyr::mutate()` on an 
Arrow Tabular, 
+Dataset, or arrow data query object, the Arrow implementation of `mutate()` is 
+used and under the hood, translates the dplyr code into Arrow C++ code.
+
+When using `dplyr::mutate()` or `dplyr::filter()`, you may want to use 
functions
+from other packages.  The example below uses `stringr::str_detect()`.
+
+```{r}
+library(dplyr)
+library(stringr)
+starwars %>%
+  filter(str_detect(name, "Darth"))
+```
+This functionality has also been implemented in Arrow, e.g.:
+
+```{r}
+library(arrow)
+arrow_table(starwars) %>%
+  filter(str_detect(name, "Darth")) %>%
+  collect()
+```
+
+This is possible as a **binding** has been created between the stringr function

Review comment:
       I love this point, yeah, I see what you mean; this could cause 
confusion.  What about now that I've rephrased it?
   
   > This is possible as a **binding** has been created between the call to the 
   stringr function `str_detect()` and the Arrow C++ code, here as a direct 
mapping
   to `match_substring_regex`.  You can see this for yourself by inspecting the 
   arrow data query object without retrieving the results via `collect()`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to