XiangpengHao commented on issue #5513: URL: https://github.com/apache/arrow-rs/issues/5513#issuecomment-2166698054
(sorry for jumping into this from nowhere) I'm trying to push this forward by summarizing the todo items: - Implement a dumb GC: collect the values into new buffers through view type builders. It's up to the user to call GC and ensure the call to GC is beneficial, i.e., the benefits of GC (smaller memory footprint) are larger than the overhead (recreating stuff). Users typically should only call GC after slice or filter (i.e., a significant decrease in cardinality). - Discuss and justify a compact check API -- if the users want to be smart and only GC when the view array is sparse. - As discussed in https://github.com/apache/arrow-rs/issues/5513#issuecomment-2102825827, we can be smart when constructing string arrays by having a hash table to deduplicate the strings. I believe this no only applies to view arrays but also byte arrays. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
