thisisnic commented on code in PR #14514:
URL: https://github.com/apache/arrow/pull/14514#discussion_r1027961179


##########
r/vignettes/metadata.Rmd:
##########
@@ -0,0 +1,82 @@
+---
+title: "Metadata"
+description: > 
+  Learn how Arrow uses Schemas to document structure of data objects, 
+  and how R metadata are supported in Arrow
+output: rmarkdown::html_vignette
+---
+
+This article describes the various data and metadata object types supplied by 
`arrow`, and documents how these objects are structured. 
+
+```{r include=FALSE}
+library(arrow, warn.conflicts = FALSE)
+```
+
+## Arrow metadata classes
+
+The `arrow` package defines the following classes for representing metadata:
+
+- A `Schema` is a list of `Field` objects used to describe the structure of a 
tabular data object; where
+- A `Field` specifies a character string name and a `DataType`; and
+- A `DataType` is an attribute controlling how values are represented
+
+Consider this:
+
+```{r}
+df <- data.frame(x = 1:3, y = c("a", "b", "c"))
+tb <- arrow_table(df)
+tb$schema
+```
+
+The schema that has been automatically inferred could also be manually created:
+
+```{r}
+schema(
+  field(name = "x", type = int32()),
+  field(name = "y", type = utf8())
+)
+```
+
+The `schema()` function allows the following shorthand to define fields:
+
+```{r}
+schema(x = int32(), y = utf8())
+```
+
+Sometimes it is important to specify the schema manually, particularly if you 
want fine grained control over the Arrow data types:
+
+```{r}
+arrow_table(df, schema = schema(x = int64(), y = utf8()))
+arrow_table(df, schema = schema(x = float64(), y = utf8()))
+```
+
+
+## R object attributes
+
+Arrow supports custom key-value metadata attached to Schemas. When we convert 
a `data.frame` to an Arrow Table or RecordBatch, the package stores any 
`attributes()` attached to the columns of the `data.frame` in the Arrow object 
Schema. Attributes added to objects in this fasnion are stored under the `r` 
key, as shown below:

Review Comment:
   ```suggestion
   Arrow supports custom key-value metadata attached to Schemas. When we 
convert a `data.frame` to an Arrow Table or RecordBatch, the package stores any 
`attributes()` attached to the columns of the `data.frame` in the Arrow object 
Schema. Attributes added to objects in this fashion are stored under the `r` 
key, as shown below:
   ```



##########
r/vignettes/metadata.Rmd:
##########
@@ -0,0 +1,82 @@
+---
+title: "Metadata"
+description: > 
+  Learn how Arrow uses Schemas to document structure of data objects, 
+  and how R metadata are supported in Arrow
+output: rmarkdown::html_vignette
+---
+
+This article describes the various data and metadata object types supplied by 
`arrow`, and documents how these objects are structured. 
+
+```{r include=FALSE}
+library(arrow, warn.conflicts = FALSE)
+```
+
+## Arrow metadata classes
+
+The `arrow` package defines the following classes for representing metadata:
+
+- A `Schema` is a list of `Field` objects used to describe the structure of a 
tabular data object; where
+- A `Field` specifies a character string name and a `DataType`; and
+- A `DataType` is an attribute controlling how values are represented
+
+Consider this:
+
+```{r}
+df <- data.frame(x = 1:3, y = c("a", "b", "c"))
+tb <- arrow_table(df)
+tb$schema
+```
+
+The schema that has been automatically inferred could also be manually created:
+
+```{r}
+schema(
+  field(name = "x", type = int32()),
+  field(name = "y", type = utf8())
+)
+```
+
+The `schema()` function allows the following shorthand to define fields:
+
+```{r}
+schema(x = int32(), y = utf8())
+```
+
+Sometimes it is important to specify the schema manually, particularly if you 
want fine grained control over the Arrow data types:

Review Comment:
   ```suggestion
   Sometimes it is important to specify the schema manually, particularly if 
you want fine-grained control over the Arrow data types:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to