[GitHub] [arrow] nealrichardson commented on a change in pull request #10765: ARROW-13399: [R] Update dataset.Rmd vignette

GitBox Wed, 04 Aug 2021 06:18:19 -0700


nealrichardson commented on a change in pull request #10765:
URL: https://github.com/apache/arrow/pull/10765#discussion_r682606477




##########
File path: r/vignettes/dataset.Rmd
##########
@@ -159,37 +171,37 @@ See $metadata for additional Schema metadata
 
 The other form of partitioning currently supported is 
[Hive](https://hive.apache.org/)-style,
 in which the partition variable names are included in the path segments.
-If we had saved our files in paths like
+If you had saved your files in paths like:
 
 ```
 year=2009/month=01/data.parquet
 year=2009/month=02/data.parquet
 ...
 ```
 
-we would not have had to provide the names in `partitioning`:
-we could have just called `ds <- open_dataset("nyc-taxi")` and the partitions
+you would not have had to provide the names in `partitioning`;
+you could have just called `ds <- open_dataset("nyc-taxi")` and the partitions
 would have been detected automatically.
 
 ## Querying the dataset
 
-Up to this point, we haven't loaded any data: we have walked directories to 
find
-files, we've parsed file paths to identify partitions, and we've read the
-headers of the Parquet files to inspect their schemas so that we can make sure
-they all line up.
+Up to this point, you haven't loaded any data.  You've walked directories to 
find
+files, you've parsed file paths to identify partitions, and you've read the
+headers of the Parquet files to inspect their schemas so that you can make sure
+they all are as expected.
 
-In the current release, `arrow` supports the dplyr verbs `mutate()`, 
+In the current release, arrow supports the dplyr verbs `mutate()`, 
 `transmute()`, `select()`, `rename()`, `relocate()`, `filter()`, and 
 `arrange()`. Aggregation is not yet supported, so before you call `summarise()`
 or other verbs with aggregate functions, use `collect()` to pull the selected
 subset of the data into an in-memory R data frame.
 
-If you attempt to call unsupported `dplyr` verbs or unimplemented functions in
-your query on an Arrow Dataset, the `arrow` package raises an error. However,
-for `dplyr` queries on `Table` objects (which are typically smaller in size) 
the
-package automatically calls `collect()` before processing that `dplyr` verb.
+Suppose you attempt to call unsupported dplyr verbs or unimplemented functions
+in your query on an Arrow Dataset. In that case, the arrow package raises an 
error. However,
+for dplyr queries on Arrow Table objects (typically smaller in size than 
Datasets), the

Review comment:
       ```suggestion
   for dplyr queries on Arrow Table objects (which are already in memory), the
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] nealrichardson commented on a change in pull request #10765: ARROW-13399: [R] Update dataset.Rmd vignette

Reply via email to