[GitHub] [arrow-cookbook] ianmcook commented on a change in pull request #1: Initial content for Arrow Cookbook for Python and R

GitBox Thu, 22 Jul 2021 09:56:28 -0700


ianmcook commented on a change in pull request #1:
URL: https://github.com/apache/arrow-cookbook/pull/1#discussion_r674987774




##########
File path: r/content/reading_and_writing_data.Rmd
##########
@@ -0,0 +1,255 @@
+# Reading and Writing Data
+
+This chapter contains recipes related to reading and writing data from disk 
using Apache Arrow.
+
+## Reading and Writing Parquet Files
+
+### Writing a Parquet file
+
+You can write Parquet files to disk using `arrow::write_parquet()`.
+```{r, write_parquet}
+# Create table
+my_table <- Table$create(tibble::tibble(group = c("A", "B", "C"), score = 
c(99, 97, 99)))
+# Write to Parquet
+write_parquet(my_table, "my_table.parquet")
+```
+```{r, test_write_parquet, opts.label = "test"}
+test_that("write_parquet chunk works as expected", {
+  expect_true(file.exists("my_table.parquet"))
+})
+```
+ 
+### Reading a Parquet file
+
+Given a Parquet file, it can be read back to an Arrow Table by using 
`arrow::read_parquet()`.
+
+```{r, read_parquet}
+parquet_tbl <- read_parquet("my_table.parquet")
+head(parquet_tbl)
+```
+```{r, test_read_parquet, opts.label = "test"}
+test_that("read_parquet works as expected", {
+  expect_equivalent(dplyr::collect(parquet_tbl), tibble::tibble(group = c("A", 
"B", "C"), score = c(99, 97, 99)))
+})
+```
+
+If the argument `as_data_frame` was set to `TRUE` (the default), the file was 
read in as a `data.frame` object.
+
+```{r, read_parquet_2}
+class(parquet_tbl)
+```
+```{r, test_read_parquet_2, opts.label = "test"}
+test_that("read_parquet_2 works as expected", {
+  expect_s3_class(parquet_tbl, "data.frame")
+})
+```
+If you set `as_data_frame` to `FALSE`, the file will be read in as an Arrow 
Table.
+
+```{r, read_parquet_table}
+my_table_arrow_table <- read_parquet("my_table.parquet", as_data_frame = FALSE)
+head(my_table_arrow_table)
+```
+
+```{r, read_parquet_table_class}
+class(my_table_arrow_table)
+```
+```{r, test_read_parquet_table_class, opts.label = "test"}
+test_that("read_parquet_table_class works as expected", {
+  expect_s3_class(my_table_arrow_table, "Table")
+})
+```
+

Review comment:
       Provide some explanation here about why one might want to read in a file 
as an Arrow Table instead of an R data frame (lower memory usage, faster 
performance, etc.) and say that you can always convert an Arrow Table to an R 
data frame by calling either `as.data.frame()` or `dplyr::collect()`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-cookbook] ianmcook commented on a change in pull request #1: Initial content for Arrow Cookbook for Python and R

Reply via email to