albertlockett commented on PR #8001:
URL: https://github.com/apache/arrow-rs/pull/8001#issuecomment-3141132458

   It seems like maybe what we need is a way to efficiently detect if a value 
is already in the dictionary. The dictionary builders in all keep some kind of 
internal state the allows some efficient lookup of this. For example, 
   
https://github.com/apache/arrow-rs/blob/876585c1cd986dbaee0c26d52b55a4186a2f68c8/arrow-array/src/builder/generic_bytes_dictionary_builder.rs#L42
   
https://github.com/apache/arrow-rs/blob/876585c1cd986dbaee0c26d52b55a4186a2f68c8/arrow-array/src/builder/fixed_size_binary_dictionary_builder.rs#L66
   
https://github.com/apache/arrow-rs/blob/876585c1cd986dbaee0c26d52b55a4186a2f68c8/arrow-array/src/builder/primitive_dictionary_builder.rs#L90
   
   Maybe we could refactor this to be something that's reusable by the IPC 
writer


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to