asubiotto commented on PR #7519: URL: https://github.com/apache/arrow-rs/pull/7519#issuecomment-2897393095
Thanks @tustvold, I wasn't aware of that PR. In our case we don't do any computations on these columns but care about the memory savings since we run in memory-constrained environments. Our data for these columns specifically has very few unique values (0.03% is a recent number). Additionally, our schema is deeply nested and these columns are usually found in the leaves (within lists of structs) so memory savings per batch is magnified. Granted, our idea was to have these be REE but that wasn't working for some reason (can't remember why but I should experiment when I have some time). Let me know how you'd like to proceed. I guess one point in favor of this PR is that if someone is using primitive dictionary batches it is likely that they value memory over perf. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
