chloro-pn commented on PR #6154:
URL: https://github.com/apache/arrow-rs/pull/6154#issuecomment-2256818105

   My suggestion is that we can treat the **second** type of GC like the 
current GC (**third** type GC) as an independent method that users can choose 
to call.
   The **first** type of GC in this PR can be placed in a new filter method, 
temporarily called `filter2`, to distinguish the old `filter` method. The 
`filter2` method can return some statistical information obtained during the 
filtering process, which can help users choose whether to perform other GC 
operations in the future. For example, we can return the number of empty 
`buffer`s and the proportion of empty `buffer`s in the `buffers` to guide users 
whether to perform the **second** type of GC.
   Here, we can also obtain statistical information to guide users on whether 
to call the **third type** of GC, but I currently do not understand the 
implementation details of the **third** type of GC. I need to look at the 
source code later.
   The reason for encapsulating the **first** type of GC into `filter2` instead 
of providing it as a standalone method like other types of GC is:
   It is lightweight and can be processed together during the filtering 
process. Putting the **second** type of GC in the `filter` method is not more 
advantageous than calling it as an independent method.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to