alihan-synnada commented on issue #13620: URL: https://github.com/apache/datafusion/issues/13620#issuecomment-2514077118
[PoC Link](https://github.com/synnada-ai/datafusion-upstream/tree/feature/take_with_iter_poc) **It requires a patched version of `arrow-buffer` that derives `Clone` for `BitIndexIterator`. The benchmark might be misleading because I had trouble with lifetimes and ended up using `Box::leak` as a last resort.** I'm not very confident in the way I set up the benchmark but I think the results are promising. Note that the selectivity only goes up to 30 because it's really slow after that point. Batch size is in log2 on the chart (i.e. batch size 13 means 2^13 *(8192)*). So it isn't really useful for the default batch size of 8192 but anything between 2^4 *(16)* and 2^10 *(1024)* might benefit from it.  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
