chloro-pn commented on PR #6154: URL: https://github.com/apache/arrow-rs/pull/6154#issuecomment-2256818105
My suggestion is that we can treat the **second** type of GC like the current GC (**third** type GC) as an independent method that users can choose to call. The **first** type of GC in this PR can be placed in a new filter method, temporarily called `filter2`, to distinguish the old `filter` method. The `filter2` method can return some statistical information obtained during the filtering process, which can help users choose whether to perform other GC operations in the future. For example, we can return the number of empty `buffer`s and the proportion of empty `buffer`s in the `buffers` to guide users whether to perform the **second** type of GC. Here, we can also obtain statistical information to guide users on whether to call the **third type** of GC, but I currently do not understand the implementation details of the **third** type of GC. I need to look at the source code later. The reason for encapsulating the **first** type of GC into `filter2` instead of providing it as a standalone method like other types of GC is: It is lightweight and can be processed together during the filtering process. Putting the **second** type of GC in the `filter` method is not more advantageous than calling it as an independent method. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
