zhuqi-lucas commented on PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#issuecomment-2907789055
> > But i think we can start from the basic optimization, only use batch size window to make the decision to choose bitmap or selector. And later, we can optimize further. > > This is an interesting idea and I think it is worth explroing > > > Maybe we can only have selector for ReadPlan, but for adaptive window size(currently fixed with batch size), we can change to bitmap if it's dense for the first step... > > 👍 > > Another thing that makes this tricky in my mind is that if `batch_size` is `8000` that requires the total number of `1`s in the mask needs to be `8000` -- the mask itself can be substantially larger (e.g. it could be `16000` and select every other row) 🤔 Very good point! @alamb It's hard for us to reduce it's overhead, maybe we can setting something like max_bitmap_iterator: When bitmap iterator hit > max_bitmap_iterator, we can consume it first as a output batch, and then to merge those batch finally. But i am not sure if it will make the performance worse than using selector. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org