nealrichardson opened a new pull request, #14676: URL: https://github.com/apache/arrow/pull/14676
A long time ago, dplyr expressions on Tables and RecordBatches were evaluated by calling compute functions on (Chunked)Arrays, calling Slice or Filter methods on the Tables/RBs, etc. So to make sure that all C++ bindings were exposed correctly, we needed to test that operations worked on both Tables and RecordBatches. Today, everything goes through ExecPlans, and RecordBatches get wrapped in Tables in creating TableSourceNodes: https://github.com/apache/arrow/blob/master/r/R/query-engine.R#L63. So as long as we are able to create a Table from a RecordBatch (tested elsewhere), the query evaluation is identical. This means we don't need to test every dplyr query twice. On my machine, this cuts off a little more than 1/3 of the running time of the dplyr tests, or about 20 seconds. The bigger benefit IMO is that when there is a failure in one of these expectations, you'll only get it once instead of twice, so it will be less confusing to see what's up. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org