nealrichardson commented on a change in pull request #10765: URL: https://github.com/apache/arrow/pull/10765#discussion_r682605964
########## File path: r/vignettes/dataset.Rmd ########## @@ -159,37 +171,37 @@ See $metadata for additional Schema metadata The other form of partitioning currently supported is [Hive](https://hive.apache.org/)-style, in which the partition variable names are included in the path segments. -If we had saved our files in paths like +If you had saved your files in paths like: ``` year=2009/month=01/data.parquet year=2009/month=02/data.parquet ... ``` -we would not have had to provide the names in `partitioning`: -we could have just called `ds <- open_dataset("nyc-taxi")` and the partitions +you would not have had to provide the names in `partitioning`; +you could have just called `ds <- open_dataset("nyc-taxi")` and the partitions would have been detected automatically. ## Querying the dataset -Up to this point, we haven't loaded any data: we have walked directories to find -files, we've parsed file paths to identify partitions, and we've read the -headers of the Parquet files to inspect their schemas so that we can make sure -they all line up. +Up to this point, you haven't loaded any data. You've walked directories to find Review comment: ```suggestion Up to this point, you haven't loaded any data. You've walked directories to find ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
