Dewey Dunnington created ARROW-15168:
----------------------------------------
Summary: [R] Add S3 generics to create main Arrow objects
Key: ARROW-15168
URL: https://issues.apache.org/jira/browse/ARROW-15168
Project: Apache Arrow
Issue Type: Improvement
Components: R
Reporter: Dewey Dunnington
Right now we create Tables, RecordBatches, ChunkedArrays, and Arrays using the
corresponding {{$create()}} functions (or a few shortcut functions). This works
well for converting other Arrow or base R types to Arow objects but doesn’t
work well for objects in other packages (e.g., sf). This is related to
ARROW-14378 in that it provides a mechanism for other packages support writing
objects to Arrow in a more Arrow-native form instead of serializing attributes
that are unlikely to be readable in other packages. Many of these came up when
experimenting with {{carrow}} when trying to provide seamless arrow package
compatibility for S3 objects that wrap external pointers to C API data
structures. S3 is a good way to do this because the other package doesn't have
to put arrow in {{Imports}} since it's a heavy dependency.
For argument’s sake I’ll propose adding the following methods:
- {{as_arrow_array(x, type = NULL)}} -> {{Array}}
- {{as_arrow_chunked_array(x, type = NULL)}} -> {{ChunkedArray}}
- {{as_arrow_record_batch(x, schema = NULL)}} -> {{RecordBatch}}
- {{as_arrow_table(x, schema = NULL)}} -> {{Table}}
- {{as_arrow_data_type(x)}} -> {{DataType}}
- {{as_arrow_record_batch_reader(x, schema = NULL)}} -> {{RecordBatchReader}}
I’ll note that use {{as_adq()}} internally for similar reasons (to convert a
few different object types into a arrow dplyr query when that’s the data
structure we need).
As part of this ticket, if we choose to move forward, we should implement the
default methods with some internal consistency (i.e., somebody wanting to
provide Arrow support in a package probably only has to implement
{{as_arrow_array()}} to get most support.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)