jonkeane commented on a change in pull request #10014: URL: https://github.com/apache/arrow/pull/10014#discussion_r613502676
########## File path: r/vignettes/dataset.Rmd ########## @@ -322,9 +330,9 @@ by calling `write_dataset()` on it: write_dataset(ds, "nyc-taxi/feather", format = "feather") ``` -Next, let's imagine that the "payment_type" column is something we often filter on, +Next, let's imagine that the `payment_type` column is something we often filter on, so we want to partition the data by that variable. By doing so we ensure that a filter like -`payment_type == 3` will touch only a subset of files where payment_type is always 3. +`payment_type == 3` will touch only a subset of files where `payment_type `is always 3. Review comment: You didn't touch the example below that does ``` ds %>% filter(payment_type == 3) %>% write_dataset("nyc-taxi/feather", format = "feather") ``` But that is no longer valid since we don't autocast numerics to strings anymore. The `group_by`/`write_dataset` will all work, but the query with a filter must have the quotes around it now. And we should probably update that in the text as well to avoid confusion, yeah? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org