JakeDern commented on PR #8001:
URL: https://github.com/apache/arrow-rs/pull/8001#issuecomment-3155883271

   @asubiotto the approach I opted to take is to allow accumulating values only 
on the builder via a `finish_preserve_values` api. This  was very simple to do 
and I think is closest to the go implementation which seems to do this by 
default. That means that the dictionary values are simply copied to the 
produced record batch when this is called and the internal de-dup dictionary is 
preserved. Only the keys are cleared.
   
   I also did a little bit of refactoring to get better visibility into the 
messages that the reader sees. Since we're trying to improve the conditions 
under which delta dictionaries are emitted (optimization), we need this 
visibility to test precisely rather than relying on heuristics like the size of 
the underlying stream.
   
   Feedback would be greatly appreciated! If this approach seems reasonable 
then I can add the same `finish_preserve_values` api to other dictionary types 
as well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to