nealrichardson opened a new pull request #10191: URL: https://github.com/apache/arrow/pull/10191
Discussing with @bkietz on #10166, we realized that we could already evaluate filter/project on Table/RecordBatch by wrapping it in InMemoryDataset and using the Dataset machinery, so I wanted to see how well that worked. Mostly it does, with a couple of caveats: * dim (nrow) is not implemented for filtered datasets yet, though #10060 will solve that * it looks like head/tail aren't currently handling projection correctly, or maybe there's something else going on because there are a couple of failing tests * with the existing array_expressions, you could supply an additional Array (or R data convertible to an Array) when doing `mutate()`; this is not currently implemented for Datasets (under the presumption that datasets are always huge). maybe it would work though I'm doubtful I've just done the most minimal changes to try to get tests passing (aside from those). There's a lot more code that could be deleted if/when we go forward with this. That deletion/refactoring could be broken up into multiple PRs if we wanted. We should also do some benchmarking. cc @jonkeane @ianmcook -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
