[arrow-cookbook] branch main updated: [R] - Flight recipes (#90)

thisisnic Thu, 28 Oct 2021 02:23:55 -0700

This is an automated email from the ASF dual-hosted git repository.

thisisnic pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-cookbook.git



The following commit(s) were added to refs/heads/main by this push:
     new 6c81e59  [R] - Flight recipes (#90)
6c81e59 is described below

commit 6c81e59a8bf56fe4d50d2d3471d1c14b0cb8ee5d
Author: Nic Crane <[email protected]>
AuthorDate: Thu Oct 28 12:23:46 2021 +0300

    [R] - Flight recipes (#90)
    
    * Flight recipes
    
    * Add code to make examples self-contained
    
    * "discussion" -> "see also"
    
    * In in solution headings
    
    * Use sentence case and remove "ing" from first verb in title
    
    * Mention it's a pyarrow thing
    
    * Update r/content/flight.Rmd
    
    Co-authored-by: Weston Pace <[email protected]>
    
    Co-authored-by: Weston Pace <[email protected]>
---
 r/content/_bookdown.yml                |   3 +-
 r/content/flight.Rmd                   | 104 +++++++++++++++++++++++++++++++++
 r/content/reading_and_writing_data.Rmd |  18 +++---
 3 files changed, 115 insertions(+), 10 deletions(-)

diff --git a/r/content/_bookdown.yml b/r/content/_bookdown.yml
index 03299b2..c2aab02 100644
--- a/r/content/_bookdown.yml
+++ b/r/content/_bookdown.yml
@@ -11,5 +11,6 @@ rmd_files: [
   "creating_arrow_objects.Rmd",
   "specify_data_types_and_schemas.Rmd",
   "arrays.Rmd",
-  "tables.Rmd"
+  "tables.Rmd",
+  "flight.Rmd"
 ]
diff --git a/r/content/flight.Rmd b/r/content/flight.Rmd
new file mode 100644
index 0000000..44c29be
--- /dev/null
+++ b/r/content/flight.Rmd
@@ -0,0 +1,104 @@
+# Flight
+
+## Introduction
+
+Flight is a general-purpose client-server framework for high performance 
+transport of large datasets over network interfaces, built as part of the 
+Apache Arrow project.
+
+Flight allows for highly efficient data transfer as it:
+
+* removes the need for serialization during data transfer
+* allows for parallel data streaming
+* is highly optimized to take advantage of Arrow’s columnar format.
+
+The arrow package provides methods for connecting to Flight RPC servers to 
send 
+and receive data.
+
+It should be noted that the Flight implementation in the R package depends on 
+[PyArrow](https://arrow.apache.org/docs/python/) which is called via 
+[reticulate](https://rstudio.github.io/reticulate/). This is quite different 
+from the other capabilities in the R package, nearly all of which are all 
+implemented directly.
+
+## Connect to a Flight server
+
+You want to connect to a Flight server running on a specified host and port.
+
+### Solution
+
+```{r, eval = FALSE}
+local_client <- flight_connect(host = "127.0.0.1", port = 8089)
+```
+
+### See also
+
+For an example of how to set up a Flight server from R, see 
+[the Flight vignette](https://arrow.apache.org/docs/r/articles/flight.html).
+
+## Send data to a Flight server
+
+You want to send data that you have in memory to a Flight server
+
+### Solution
+
+```{r, eval = FALSE}
+# Connect to the Flight server
+local_client <- flight_connect(host = "127.0.0.1", port = 8089)
+
+# Send the data
+flight_put(
+  local_client,
+  data = airquality,
+  path = "pollution_data"
+)
+```
+
+## Check what resources exist on a Flight server
+
+You want to see what paths are available on a Flight server.
+
+### Solution
+
+```{r, eval = FALSE}
+# Connect to the Flight server
+local_client <- flight_connect(host = "127.0.0.1", port = 8089)
+
+# Retrieve path listing
+list_flights(local_client)
+```
+
+```{r}
+# [1] "pollution_data"
+```
+
+
+## Retrieve data from a Flight server
+
+You want to retrieve data on a Flight server from a specified path.
+
+### Solution
+
+```{r, eval = FALSE}
+# Connect to the Flight server
+local_client <- flight_connect(host = "127.0.0.1", port = 8089)
+
+# Retrieve data
+flight_get(
+  local_client,
+  "pollution_data"
+)
+```
+
+```{r, eval = FALSE}
+# Table
+# 153 rows x 6 columns
+# $Ozone <int32>
+# $Solar.R <int32>
+# $Wind <double>
+# $Temp <int32>
+# $Month <int32>
+# $Day <int32>
+# 
+# See $metadata for additional Schema metadata
+```
diff --git a/r/content/reading_and_writing_data.Rmd 
b/r/content/reading_and_writing_data.Rmd
index 8cd2e36..f18f5b0 100644
--- a/r/content/reading_and_writing_data.Rmd
+++ b/r/content/reading_and_writing_data.Rmd
@@ -10,7 +10,7 @@ There are a number of circumstances in which you may want to 
read in the data as
 * you want faster performance from your `dplyr` queries
 * you want to be able to take advantage of Arrow's compute functions
 
-## Converting from a data frame to an Arrow Table
+## Convert from a data frame to an Arrow Table
 
 You want to convert an existing `data.frame` or `tibble` object into an Arrow 
Table.
 
@@ -26,7 +26,7 @@ test_that("table_create_from_df chunk works as expected", {
 })
 ```
 
-## Converting data from an Arrow Table to a data frame
+## Convert data from an Arrow Table to a data frame
 
 You want to convert an Arrow Table to a data frame to view the data or work 
with it
 in your usual analytics pipeline.  You can use either `as.data.frame()` or 
@@ -44,7 +44,7 @@ test_that("asdf_table chunk works as expected", {
 })
 ```
 
-## Writing a Parquet file
+## Write a Parquet file
 
 You want to write Parquet files to disk.
 
@@ -62,7 +62,7 @@ test_that("write_parquet chunk works as expected", {
 })
 ```
  
-## Reading a Parquet file
+## Read a Parquet file
 
 You want to read a Parquet file.
 
@@ -192,7 +192,7 @@ test_that("read_feather chunk works as expected", {
 unlink("my_table.arrow")
 ```
 
-## Write Streaming IPC Files
+## Write streaming IPC files
 
 You want to write to the IPC stream format.
 
@@ -215,7 +215,7 @@ test_that("write_ipc_stream chunk works as expected", {
 })
 ```
 
-## Read Streaming IPC Files
+## Read streaming IPC files
 
 You want to read from the IPC stream format.
 
@@ -233,7 +233,7 @@ test_that("read_ipc_stream chunk works as expected", {
 unlink("my_table.arrows")
 ```
 
-## Reading and Writing CSV files 
+## Read and write CSV files 
 
 You can use `write_csv_arrow()` to save an Arrow Table to disk as a CSV.
 
@@ -293,7 +293,7 @@ unlink(tf)
 ```
 
 
-## Write Partitioned Data
+## Write partitioned data
 
 You want to save data to disk in partitions based on columns in the data.
 
@@ -325,7 +325,7 @@ Each of these folders contains 1 or more Parquet files 
containing the relevant p
 list.files("airquality_partitioned/Month=5/Day=10")
 ```
 
-## Reading Partitioned Data
+## Read partitioned data
 
 You want to read partitioned data.

[arrow-cookbook] branch main updated: [R] - Flight recipes (#90)

Reply via email to