nealrichardson opened a new pull request, #13210:
URL: https://github.com/apache/arrow/pull/13210

   * Pushes KVM handling into ExecPlan so that Run() preserves the R metadata 
we want.
   * Also pushes special handling for a kind of collapsed query from collect() 
into Build(). 
   * Better encapsulate KVM for the the $metadata and $r_metadata so that as a 
user/developer, you never have to touch the serialize/deserialize functions, 
you just have a list to work with. This is a slight API change, most noticeable 
if you were to `print(tab$metadata)`; better is to `print(str(tab$metdata))`.
   * Factor out a common utility in r/src for taking cpp11::strings (named 
character vector) and producing arrow::KeyValueMetadata
   
   The upshot of all of this is that we can push the ExecPlan evaluation into 
`as_record_batch_reader()`, and all that `collect()` does on top is read the 
RBR into a Table/data.frame. This means that we can plug dplyr queries into 
anything else that expects a RecordBatchReader, and it will be (to the maximum 
extent possible, given the limitations of ExecPlan) streaming, not requiring 
you to `compute()` and materialize things first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to