thisisnic opened a new issue #93: URL: https://github.com/apache/arrow-cookbook/issues/93
> You’ll notice we’ve used collect() in the Arrow pipeline above. That’s because one of the ways in which arrow is efficient is that it works out the instructions for the calculations it needs to perform (expressions) and only runs them once you actually pull the data into your R session. We might rephrase this to make it clear we mean the computation only happens when you trigger it, but that the computation happens in Arrow and not in R > It also means that you are able to manipulate data that is larger than you can fit into memory on the machine you’re running your code on, if you only pull data into R when you have selected the desired subset. We also have the ability to operate on chunks of data so you might not even need to subset it to be smaller than memory, just be able to have compute kernels finish with chunks that re smaller than memory. I’m not sure if we want/need to mention that here, just something to note. > You want to use a function which is implemented in Arrow’s C++ library but either: * it doesn’t have a mapping to a base R or tidyverse equivalent, or * it has a mapping but nevertheless you want to call the C++ function directly It looks like the bullets aren’t being caught here (probably need a stupid extra new line somewhere) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
