This is an automated email from the ASF dual-hosted git repository.
thisisnic pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-cookbook.git
The following commit(s) were added to refs/heads/main by this push:
new 6c81e59 [R] - Flight recipes (#90)
6c81e59 is described below
commit 6c81e59a8bf56fe4d50d2d3471d1c14b0cb8ee5d
Author: Nic Crane <[email protected]>
AuthorDate: Thu Oct 28 12:23:46 2021 +0300
[R] - Flight recipes (#90)
* Flight recipes
* Add code to make examples self-contained
* "discussion" -> "see also"
* In in solution headings
* Use sentence case and remove "ing" from first verb in title
* Mention it's a pyarrow thing
* Update r/content/flight.Rmd
Co-authored-by: Weston Pace <[email protected]>
Co-authored-by: Weston Pace <[email protected]>
---
r/content/_bookdown.yml | 3 +-
r/content/flight.Rmd | 104 +++++++++++++++++++++++++++++++++
r/content/reading_and_writing_data.Rmd | 18 +++---
3 files changed, 115 insertions(+), 10 deletions(-)
diff --git a/r/content/_bookdown.yml b/r/content/_bookdown.yml
index 03299b2..c2aab02 100644
--- a/r/content/_bookdown.yml
+++ b/r/content/_bookdown.yml
@@ -11,5 +11,6 @@ rmd_files: [
"creating_arrow_objects.Rmd",
"specify_data_types_and_schemas.Rmd",
"arrays.Rmd",
- "tables.Rmd"
+ "tables.Rmd",
+ "flight.Rmd"
]
diff --git a/r/content/flight.Rmd b/r/content/flight.Rmd
new file mode 100644
index 0000000..44c29be
--- /dev/null
+++ b/r/content/flight.Rmd
@@ -0,0 +1,104 @@
+# Flight
+
+## Introduction
+
+Flight is a general-purpose client-server framework for high performance
+transport of large datasets over network interfaces, built as part of the
+Apache Arrow project.
+
+Flight allows for highly efficient data transfer as it:
+
+* removes the need for serialization during data transfer
+* allows for parallel data streaming
+* is highly optimized to take advantage of Arrow’s columnar format.
+
+The arrow package provides methods for connecting to Flight RPC servers to
send
+and receive data.
+
+It should be noted that the Flight implementation in the R package depends on
+[PyArrow](https://arrow.apache.org/docs/python/) which is called via
+[reticulate](https://rstudio.github.io/reticulate/). This is quite different
+from the other capabilities in the R package, nearly all of which are all
+implemented directly.
+
+## Connect to a Flight server
+
+You want to connect to a Flight server running on a specified host and port.
+
+### Solution
+
+```{r, eval = FALSE}
+local_client <- flight_connect(host = "127.0.0.1", port = 8089)
+```
+
+### See also
+
+For an example of how to set up a Flight server from R, see
+[the Flight vignette](https://arrow.apache.org/docs/r/articles/flight.html).
+
+## Send data to a Flight server
+
+You want to send data that you have in memory to a Flight server
+
+### Solution
+
+```{r, eval = FALSE}
+# Connect to the Flight server
+local_client <- flight_connect(host = "127.0.0.1", port = 8089)
+
+# Send the data
+flight_put(
+ local_client,
+ data = airquality,
+ path = "pollution_data"
+)
+```
+
+## Check what resources exist on a Flight server
+
+You want to see what paths are available on a Flight server.
+
+### Solution
+
+```{r, eval = FALSE}
+# Connect to the Flight server
+local_client <- flight_connect(host = "127.0.0.1", port = 8089)
+
+# Retrieve path listing
+list_flights(local_client)
+```
+
+```{r}
+# [1] "pollution_data"
+```
+
+
+## Retrieve data from a Flight server
+
+You want to retrieve data on a Flight server from a specified path.
+
+### Solution
+
+```{r, eval = FALSE}
+# Connect to the Flight server
+local_client <- flight_connect(host = "127.0.0.1", port = 8089)
+
+# Retrieve data
+flight_get(
+ local_client,
+ "pollution_data"
+)
+```
+
+```{r, eval = FALSE}
+# Table
+# 153 rows x 6 columns
+# $Ozone <int32>
+# $Solar.R <int32>
+# $Wind <double>
+# $Temp <int32>
+# $Month <int32>
+# $Day <int32>
+#
+# See $metadata for additional Schema metadata
+```
diff --git a/r/content/reading_and_writing_data.Rmd
b/r/content/reading_and_writing_data.Rmd
index 8cd2e36..f18f5b0 100644
--- a/r/content/reading_and_writing_data.Rmd
+++ b/r/content/reading_and_writing_data.Rmd
@@ -10,7 +10,7 @@ There are a number of circumstances in which you may want to
read in the data as
* you want faster performance from your `dplyr` queries
* you want to be able to take advantage of Arrow's compute functions
-## Converting from a data frame to an Arrow Table
+## Convert from a data frame to an Arrow Table
You want to convert an existing `data.frame` or `tibble` object into an Arrow
Table.
@@ -26,7 +26,7 @@ test_that("table_create_from_df chunk works as expected", {
})
```
-## Converting data from an Arrow Table to a data frame
+## Convert data from an Arrow Table to a data frame
You want to convert an Arrow Table to a data frame to view the data or work
with it
in your usual analytics pipeline. You can use either `as.data.frame()` or
@@ -44,7 +44,7 @@ test_that("asdf_table chunk works as expected", {
})
```
-## Writing a Parquet file
+## Write a Parquet file
You want to write Parquet files to disk.
@@ -62,7 +62,7 @@ test_that("write_parquet chunk works as expected", {
})
```
-## Reading a Parquet file
+## Read a Parquet file
You want to read a Parquet file.
@@ -192,7 +192,7 @@ test_that("read_feather chunk works as expected", {
unlink("my_table.arrow")
```
-## Write Streaming IPC Files
+## Write streaming IPC files
You want to write to the IPC stream format.
@@ -215,7 +215,7 @@ test_that("write_ipc_stream chunk works as expected", {
})
```
-## Read Streaming IPC Files
+## Read streaming IPC files
You want to read from the IPC stream format.
@@ -233,7 +233,7 @@ test_that("read_ipc_stream chunk works as expected", {
unlink("my_table.arrows")
```
-## Reading and Writing CSV files
+## Read and write CSV files
You can use `write_csv_arrow()` to save an Arrow Table to disk as a CSV.
@@ -293,7 +293,7 @@ unlink(tf)
```
-## Write Partitioned Data
+## Write partitioned data
You want to save data to disk in partitions based on columns in the data.
@@ -325,7 +325,7 @@ Each of these folders contains 1 or more Parquet files
containing the relevant p
list.files("airquality_partitioned/Month=5/Day=10")
```
-## Reading Partitioned Data
+## Read partitioned data
You want to read partitioned data.