[GitHub] [arrow-cookbook] thisisnic commented on a change in pull request #78: ARROW-13732: [Doc][Cookbook] Manipulating and analyze Arrow data with dplyr verbs - R

GitBox Sat, 02 Oct 2021 07:22:01 -0700


thisisnic commented on a change in pull request #78:
URL: https://github.com/apache/arrow-cookbook/pull/78#discussion_r720685439




##########
File path: r/content/tables.Rmd
##########
@@ -0,0 +1,251 @@
+# Manipulating Data - Tables
+
+__What you should know before you begin__
+
+When you call dplyr verbs from Arrow, behind the scenes this generates 
+instructions which tell Arrow how to manipulate the data in the way you've 
+specified.  These instructions are called _expressions_.  Until you pull the 
+data back into R, expressions don't do any work to actually retrieve or 
+manipulate any data. This is known as _lazy evaluation_ and means that you can 
+build up complex expressions that perform multiple actions, and are 
efficiently 
+evaluated all at once when you retrieve the data.  It also means that you are 
+able to manipulate data that is larger than
+you can fit into memory on the machine you're running your code on, if you 
only 
+pull data into R when you have selected the desired subset. 
+
+You can also have data which is split across multiple files.  For example, you
+might have files which are stored in multiple Parquet or Feather files, 
+partitioned across different directories.  You can open multi-file datasets 
+using `open_dataset()` as discussed in a previous chapter, and then manipulate 
+this data using arrow before even reading any of it into R.
+

Review comment:
       What are row groups and statistics filters?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-cookbook] thisisnic commented on a change in pull request #78: ARROW-13732: [Doc][Cookbook] Manipulating and analyze Arrow data with dplyr verbs - R

Reply via email to