thisisnic commented on a change in pull request #78: URL: https://github.com/apache/arrow-cookbook/pull/78#discussion_r720685439
########## File path: r/content/tables.Rmd ########## @@ -0,0 +1,251 @@ +# Manipulating Data - Tables + +__What you should know before you begin__ + +When you call dplyr verbs from Arrow, behind the scenes this generates +instructions which tell Arrow how to manipulate the data in the way you've +specified. These instructions are called _expressions_. Until you pull the +data back into R, expressions don't do any work to actually retrieve or +manipulate any data. This is known as _lazy evaluation_ and means that you can +build up complex expressions that perform multiple actions, and are efficiently +evaluated all at once when you retrieve the data. It also means that you are +able to manipulate data that is larger than +you can fit into memory on the machine you're running your code on, if you only +pull data into R when you have selected the desired subset. + +You can also have data which is split across multiple files. For example, you +might have files which are stored in multiple Parquet or Feather files, +partitioned across different directories. You can open multi-file datasets +using `open_dataset()` as discussed in a previous chapter, and then manipulate +this data using arrow before even reading any of it into R. + Review comment: What are row groups and statistics filters? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org