nealrichardson opened a new pull request #10191:
URL: https://github.com/apache/arrow/pull/10191


   Discussing with @bkietz on #10166, we realized that we could already 
evaluate filter/project on Table/RecordBatch by wrapping it in InMemoryDataset 
and using the Dataset machinery, so I wanted to see how well that worked. 
Mostly it does, with a couple of caveats:
   
   * dim (nrow) is not implemented for filtered datasets yet, though #10060 
will solve that
   * it looks like head/tail aren't currently handling projection correctly, or 
maybe there's something else going on because there are a couple of failing 
tests
   * with the existing array_expressions, you could supply an additional Array 
(or R data convertible to an Array) when doing `mutate()`; this is not 
currently implemented for Datasets (under the presumption that datasets are 
always huge). maybe it would work though I'm doubtful
   
   I've just done the most minimal changes to try to get tests passing (aside 
from those). There's a lot more code that could be deleted if/when we go 
forward with this. That deletion/refactoring could be broken up into multiple 
PRs if we wanted. We should also do some benchmarking.
   
   cc @jonkeane @ianmcook 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to