This is an automated email from the ASF dual-hosted git repository.

thisisnic pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-cookbook.git


The following commit(s) were added to refs/heads/main by this push:
     new d7bc6b2  [R] 93 - dplyr chapter feedback (#94)
d7bc6b2 is described below

commit d7bc6b230631488da7ee100402d7c8270463d2d5
Author: Nic <[email protected]>
AuthorDate: Tue Oct 26 13:47:52 2021 +0300

    [R] 93 - dplyr chapter feedback (#94)
    
    * Fix bullet points
    
    * Ensure it's obvious arrow is doing the work
    
    * chunks
---
 r/content/tables.Rmd | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/r/content/tables.Rmd b/r/content/tables.Rmd
index 1c935d7..76ea0db 100644
--- a/r/content/tables.Rmd
+++ b/r/content/tables.Rmd
@@ -55,14 +55,15 @@ test_that("dplyr_raw and dplyr_arrow chunk provide the same 
results", {
 
 You'll notice we've used `collect()` in the Arrow pipeline above.  That's 
because 
 one of the ways in which `arrow` is efficient is that it works out the 
instructions
-for the calculations it needs to perform (_expressions_) and only runs them 
once 
-you actually pull the data into your R session.  This means instead of doing 
-lots of separate operations, it does them all at once in a more optimised way, 
-_lazy evaluation_.
+for the calculations it needs to perform (_expressions_) and only runs them 
+using arrow once you actually pull the data into your R session.  This means 
+instead of doing lots of separate operations, it does them all at once in a 
+more optimised way, _lazy evaluation_.
 
 It also means that you are able to manipulate data that is larger than you can 
 fit into memory on the machine you're running your code on, if you only pull 
-data into R when you have selected the desired subset. 
+data into R when you have selected the desired subset, or when using functions 
+which can operate on chunks of data. 
 
 You can also have data which is split across multiple files.  For example, you
 might have files which are stored in multiple Parquet or Feather files, 
@@ -173,6 +174,7 @@ test_that("dplyr_func_warning", {
 ## Use arrow functions in dplyr verbs in arrow
 
 You want to use a function which is implemented in Arrow's C++ library but 
either:
+
 * it doesn't have a mapping to a base R or tidyverse equivalent, or 
 * it has a mapping but nevertheless you want to call the C++ function directly
 

Reply via email to