thisisnic commented on code in PR #14514:
URL: https://github.com/apache/arrow/pull/14514#discussion_r1011820496


##########
r/vignettes/package_conventions.Rmd:
##########
@@ -0,0 +1,25 @@
+---
+title: "Package conventions"
+description: >
+  Learn how R6 classes are used in `arrow` to wrap the 
+  underlying C++ library, and when to use these objects
+  rather than the R-friendly wrapper functions
+output: rmarkdown::html_vignette
+---
+
+C++ is an object-oriented language, so the core logic of the Arrow C++ library 
is encapsulated in classes and methods. In the `arrow` R package, these classes 
are implemented as [`R6`](https://r6.r-lib.org) classes, most of which are 
exported from the namespace.
+
+## Naming conventions
+
+In order to match the C++ naming conventions, the `R6` classes are named in 
"TitleCase", e.g. `RecordBatch`. This makes it easy to look up the relevant C++ 
implementations in the [code](https://github.com/apache/arrow/tree/master/cpp) 
or [documentation](https://arrow.apache.org/docs/cpp/). To simplify things in 
R, the C++ library namespaces are generally dropped or flattened; that is, 
where the C++ library has `arrow::io::FileOutputStream`, it is just 
`FileOutputStream` in the R package. One exception is for the file readers, 
where the namespace is necessary to disambiguate. So `arrow::csv::TableReader` 
becomes `CsvTableReader`, and `arrow::json::TableReader` becomes 
`JsonTableReader`.
+
+Some of these classes are not meant to be instantiated directly; they may be 
base classes or other kinds of helpers. For those that you should be able to 
create, use the `$create()` method to instantiate an object. For example, `rb 
<- RecordBatch$create(int = 1:10, dbl = as.numeric(1:10))` will create a 
`RecordBatch`. Many of these factory methods that an R user might most often 
encounter also have a "snake_case" alias, in order to be more familiar for 
contemporary R users. So `record_batch(int = 1:10, dbl = as.numeric(1:10))` 
would do the same as `RecordBatch$create()` above.
+
+The typical user of the `arrow` R package may never deal directly with the 
`R6` objects. We provide more R-friendly wrapper functions as a higher-level 
interface to the C++ library. An R user can call `read_parquet()` without 
knowing or caring that they're instantiating a `ParquetFileReader` object and 
calling the `$ReadFile()` method on it. The classes are there and available to 
the advanced programmer who wants fine-grained control over how the C++ library 
is used.
+
+## Further reading
+
+- [Documentation for the Arrow C++ library](https://arrow.apache.org/docs/cpp/)
+- [API reference for the Arrow C++ 
classes](https://arrow.apache.org/docs/cpp/api.html)
+
+

Review Comment:
   I'm feeling a little bit resistant about this new vignette, if only because 
having to explain "when to use these objects rather than the R-friendly wrapper 
functions" might in some cases just be a symptom that we need to make more 
wrapper functions ;)  
   
   I think this content is well-explained, but can we chat a bit about who are 
aiming it at and why we're including it?  I want to make sure that we do need 
it before we incorporate it, as it's more content to need to maintain.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to