djnavarro commented on code in PR #14514:
URL: https://github.com/apache/arrow/pull/14514#discussion_r1012363444
##########
r/vignettes/data_objects.Rmd:
##########
@@ -0,0 +1,206 @@
+---
+title: "Data objects"
+description: >
+ Learn about Scalar, Array, Table, and Dataset objects in `arrow`
+ (among others), how they relate to each other, as well as their
+ relationships to familiar R objects like data frames and vectors
+output: rmarkdown::html_vignette
+---
+
+This article describes the various data object types supplied by `arrow`, and
documents how these objects are structured.
+
+```{r include=FALSE}
+library(arrow, warn.conflicts = FALSE)
+```
+
+The `arrow` package supplies several object classes that are used to represent
data. `RecordBatch`, `Table`, and `Dataset` objects are two-dimensional
rectangular data structures used to store tabular data. For columnar,
one-dimensional data, the `Array` and `ChunkedArray` classes are provided.
Finally, `Scalar` objects represent individual values. The table below
summarizes these objects and shows how you can create new instances using the
[`R6`](https://r6.r-lib.org/) class object, as well as convenience functions
that provide the same functionality in a more traditional R-like fashion:
+
+| Dim | Class | How to create an instance |
Convenience function |
+| --- | -------------- | ----------------------------------------------|
--------------------------------------------- |
+| 0 | `Scalar` | `Scalar$create(value, type)` |
|
+| 1 | `Array` | `Array$create(vector, type)` |
|
+| 1 | `ChunkedArray` | `ChunkedArray$create(..., type)` |
`chunked_array(..., type)` |
+| 2 | `RecordBatch` | `RecordBatch$create(...)` |
`record_batch(...)` |
+| 2 | `Table` | `Table$create(...)` |
`arrow_table(...)` |
+| 2 | `Dataset` | `Dataset$create(sources, schema)` |
`open_dataset(sources, schema)` |
+
+Later in the article we'll look at each of these in more detail.
+
+For now we note that each of these object classes corresponds to a class of
the same name in the underlying Arrow C++ library. It is also worth mentioning
that the `arrow` package also defines classes that do not exist in the C++
library including:
+
+* `ArrowDatum`: inherited by `Scalar`, `Array`, and `ChunkedArray`
+* `ArrowTabular`: inherited by `RecordBatch` and `Table`
+* `ArrowObject`: inherited by all Arrow objects
Review Comment:
Probably not! This is also another case where I didn't really want to keep
them but felt obligated to do so because these classes are currently listed on
the get started page:
https://arrow.apache.org/docs/r/articles/arrow.html#data-objects. Exactly as
happened with the R6/C++ classes, I've moved it into this vignette as a way of
making it less prominent than it currently is. Again, maybe the solution is to
delete entirely or move into a developer vignette.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]