chloro-pn commented on PR #6154: URL: https://github.com/apache/arrow-rs/pull/6154#issuecomment-2256828520
> My concern with this approach (and with any of the "gc while filtering" approaches) is that it will work well for some use cases but be unavoidable overhead with other use cases. > > I think we need an API that allows callers to specify how they want the GC to be done > > What I have talked about with @XiangpengHao and a few others is making some sort of stateful filter API; Among other things such an API would give us a place to put options for filtering > > So maybe it would look like > > ```rust > // create a new filterer > let filterer = ArrayFilterer::new() > // specify that GC should look for empty buffers > .with_gc_strategy(GcStrategy::FindEmptyBuffers); > > // feed in arrays and their filters (BooleanArray) > filterer.push(array1, filter1)?; > filterer.push(array2, filter2)?; > > // Produce the final output array > let filtered_array = filterer.build(); > ``` Provide a new API instead of modifying the existing one, agree with this and leave the choice to the user. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
