nealrichardson opened a new pull request, #14676:
URL: https://github.com/apache/arrow/pull/14676

   A long time ago, dplyr expressions on Tables and RecordBatches were 
evaluated by calling compute functions on (Chunked)Arrays, calling Slice or 
Filter methods on the Tables/RBs, etc. So to make sure that all C++ bindings 
were exposed correctly, we needed to test that operations worked on both Tables 
and RecordBatches. 
   
   Today, everything goes through ExecPlans, and RecordBatches get wrapped in 
Tables in creating TableSourceNodes: 
https://github.com/apache/arrow/blob/master/r/R/query-engine.R#L63. So as long 
as we are able to create a Table from a RecordBatch (tested elsewhere), the 
query evaluation is identical. This means we don't need to test every dplyr 
query twice.
   
   On my machine, this cuts off a little more than 1/3 of the running time of 
the dplyr tests, or about 20 seconds. The bigger benefit IMO is that when there 
is a failure in one of these expectations, you'll only get it once instead of 
twice, so it will be less confusing to see what's up. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to