[GitHub] [arrow] nealrichardson commented on a change in pull request #10765: ARROW-13399: [R] Update dataset.Rmd vignette

GitBox Thu, 29 Jul 2021 09:01:29 -0700


nealrichardson commented on a change in pull request #10765:
URL: https://github.com/apache/arrow/pull/10765#discussion_r679285857




##########
File path: r/vignettes/dataset.Rmd
##########
@@ -313,27 +330,29 @@ instead of a file path, or simply concatenate them like 
`big_dataset <- c(ds1, d
 
 As you can see, querying a large dataset can be made quite fast by storage in 
an
 efficient binary columnar format like Parquet or Feather and partitioning 
based on
-columns commonly used for filtering. However, we don't always get our data 
delivered
-to us that way. Sometimes we start with one giant CSV. Our first step in 
analyzing data
+columns commonly used for filtering. However, data isn't always stored that 
way.
+Sometimes you might start with one giant CSV. The first step in analyzing data 
 is cleaning is up and reshaping it into a more usable form.
 
-The `write_dataset()` function allows you to take a Dataset or other tabular 
data object---an Arrow `Table` or `RecordBatch`, or an R `data.frame`---and 
write it to a different file format, partitioned into multiple files.
+The `write_dataset()` function allows you to take a Dataset or another tabular 
+data object - an Arrow Table or RecordBatch, or an R data frame - and write

Review comment:
       I know 😞  I think I'd go with "data frame" so as to (in theory) avoid 
the "you said data.frame but it's a tibble" though my reasoning for picking one 
or the other is more about what's in my head when I'm writing, and you're 
right, the distinction is not perfect in the real world, nor may the reader 
share my distinction.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] nealrichardson commented on a change in pull request #10765: ARROW-13399: [R] Update dataset.Rmd vignette

Reply via email to