XiangpengHao commented on issue #5513:
URL: https://github.com/apache/arrow-rs/issues/5513#issuecomment-2166698054

   (sorry for jumping into this from nowhere)
   I'm trying to push this forward by summarizing the todo items:
   - Implement a dumb GC: collect the values into new buffers through view type 
builders. It's up to the user to call GC and ensure the call to GC is 
beneficial, i.e., the benefits of GC (smaller memory footprint) are larger than 
the overhead (recreating stuff). Users typically should only call GC after 
slice or filter (i.e., a significant decrease in cardinality).
   - Discuss and justify a compact check API -- if the users want to be smart 
and only GC when the view array is sparse. 
   - As discussed in 
https://github.com/apache/arrow-rs/issues/5513#issuecomment-2102825827, we can 
be smart when constructing string arrays by having a hash table to deduplicate 
the strings. I believe this no only applies to view arrays but also byte arrays.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to