thisisnic opened a new issue #93:
URL: https://github.com/apache/arrow-cookbook/issues/93


   > You’ll notice we’ve used collect() in the Arrow pipeline above. That’s 
because one of the ways in which arrow is efficient is that it works out the 
instructions for the calculations it needs to perform (expressions) and only 
runs them once you actually pull the data into your R session.
   
   We might rephrase this to make it clear we mean the computation only happens 
when you trigger it, but that the computation happens in Arrow and not in R
   
   > It also means that you are able to manipulate data that is larger than you 
can fit into memory on the machine you’re running your code on, if you only 
pull data into R when you have selected the desired subset.
   
   We also have the ability to operate on chunks of data so you might not even 
need to subset it to be smaller than memory, just be able to have compute 
kernels finish with chunks that re smaller than memory. I’m not sure if we 
want/need to mention that here, just something to note.
   
   > You want to use a function which is implemented in Arrow’s C++ library but 
either: * it doesn’t have a mapping to a base R or tidyverse equivalent, or * 
it has a mapping but nevertheless you want to call the C++ function directly
   
   It looks like the bullets aren’t being caught here (probably need a stupid 
extra new line somewhere)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to