djnavarro commented on code in PR #14514:
URL: https://github.com/apache/arrow/pull/14514#discussion_r1012480175
##########
r/vignettes/data_objects.Rmd:
##########
@@ -0,0 +1,206 @@
+---
+title: "Data objects"
+description: >
+ Learn about Scalar, Array, Table, and Dataset objects in `arrow`
+ (among others), how they relate to each other, as well as their
+ relationships to familiar R objects like data frames and vectors
+output: rmarkdown::html_vignette
+---
+
+This article describes the various data object types supplied by `arrow`, and
documents how these objects are structured.
+
+```{r include=FALSE}
+library(arrow, warn.conflicts = FALSE)
+```
+
+The `arrow` package supplies several object classes that are used to represent
data. `RecordBatch`, `Table`, and `Dataset` objects are two-dimensional
rectangular data structures used to store tabular data. For columnar,
one-dimensional data, the `Array` and `ChunkedArray` classes are provided.
Finally, `Scalar` objects represent individual values. The table below
summarizes these objects and shows how you can create new instances using the
[`R6`](https://r6.r-lib.org/) class object, as well as convenience functions
that provide the same functionality in a more traditional R-like fashion:
+
+| Dim | Class | How to create an instance |
Convenience function |
+| --- | -------------- | ----------------------------------------------|
--------------------------------------------- |
+| 0 | `Scalar` | `Scalar$create(value, type)` |
|
+| 1 | `Array` | `Array$create(vector, type)` |
|
+| 1 | `ChunkedArray` | `ChunkedArray$create(..., type)` |
`chunked_array(..., type)` |
+| 2 | `RecordBatch` | `RecordBatch$create(...)` |
`record_batch(...)` |
+| 2 | `Table` | `Table$create(...)` |
`arrow_table(...)` |
+| 2 | `Dataset` | `Dataset$create(sources, schema)` |
`open_dataset(sources, schema)` |
+
+Later in the article we'll look at each of these in more detail.
+
+For now we note that each of these object classes corresponds to a class of
the same name in the underlying Arrow C++ library. It is also worth mentioning
that the `arrow` package also defines classes that do not exist in the C++
library including:
+
+* `ArrowDatum`: inherited by `Scalar`, `Array`, and `ChunkedArray`
+* `ArrowTabular`: inherited by `RecordBatch` and `Table`
+* `ArrowObject`: inherited by all Arrow objects
Review Comment:
Sounds good to me: I chatted with @jonkeane this morning about it too and
they independently made the same suggestion which makes me think moving these
details to dev vignettes is the right move
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]