JakeDern commented on PR #8001: URL: https://github.com/apache/arrow-rs/pull/8001#issuecomment-3155883271
@asubiotto the approach I opted to take is to allow accumulating values only on the builder via a `finish_preserve_values` api. This was very simple to do and I think is closest to the go implementation which seems to do this by default. That means that the dictionary values are simply copied to the produced record batch when this is called and the internal de-dup dictionary is preserved. Only the keys are cleared. I also did a little bit of refactoring to get better visibility into the messages that the reader sees. Since we're trying to improve the conditions under which delta dictionaries are emitted (optimization), we need this visibility to test precisely rather than relying on heuristics like the size of the underlying stream. Feedback would be greatly appreciated! If this approach seems reasonable then I can add the same `finish_preserve_values` api to other dictionary types as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org