This is an automated email from the ASF dual-hosted git repository.
thisisnic pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-cookbook.git
The following commit(s) were added to refs/heads/main by this push:
new d7bc6b2 [R] 93 - dplyr chapter feedback (#94)
d7bc6b2 is described below
commit d7bc6b230631488da7ee100402d7c8270463d2d5
Author: Nic <[email protected]>
AuthorDate: Tue Oct 26 13:47:52 2021 +0300
[R] 93 - dplyr chapter feedback (#94)
* Fix bullet points
* Ensure it's obvious arrow is doing the work
* chunks
---
r/content/tables.Rmd | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/r/content/tables.Rmd b/r/content/tables.Rmd
index 1c935d7..76ea0db 100644
--- a/r/content/tables.Rmd
+++ b/r/content/tables.Rmd
@@ -55,14 +55,15 @@ test_that("dplyr_raw and dplyr_arrow chunk provide the same
results", {
You'll notice we've used `collect()` in the Arrow pipeline above. That's
because
one of the ways in which `arrow` is efficient is that it works out the
instructions
-for the calculations it needs to perform (_expressions_) and only runs them
once
-you actually pull the data into your R session. This means instead of doing
-lots of separate operations, it does them all at once in a more optimised way,
-_lazy evaluation_.
+for the calculations it needs to perform (_expressions_) and only runs them
+using arrow once you actually pull the data into your R session. This means
+instead of doing lots of separate operations, it does them all at once in a
+more optimised way, _lazy evaluation_.
It also means that you are able to manipulate data that is larger than you can
fit into memory on the machine you're running your code on, if you only pull
-data into R when you have selected the desired subset.
+data into R when you have selected the desired subset, or when using functions
+which can operate on chunks of data.
You can also have data which is split across multiple files. For example, you
might have files which are stored in multiple Parquet or Feather files,
@@ -173,6 +174,7 @@ test_that("dplyr_func_warning", {
## Use arrow functions in dplyr verbs in arrow
You want to use a function which is implemented in Arrow's C++ library but
either:
+
* it doesn't have a mapping to a base R or tidyverse equivalent, or
* it has a mapping but nevertheless you want to call the C++ function directly