albertlockett commented on PR #8001: URL: https://github.com/apache/arrow-rs/pull/8001#issuecomment-3141132458
It seems like maybe what we need is a way to efficiently detect if a value is already in the dictionary. The dictionary builders in all keep some kind of internal state the allows some efficient lookup of this. For example, https://github.com/apache/arrow-rs/blob/876585c1cd986dbaee0c26d52b55a4186a2f68c8/arrow-array/src/builder/generic_bytes_dictionary_builder.rs#L42 https://github.com/apache/arrow-rs/blob/876585c1cd986dbaee0c26d52b55a4186a2f68c8/arrow-array/src/builder/fixed_size_binary_dictionary_builder.rs#L66 https://github.com/apache/arrow-rs/blob/876585c1cd986dbaee0c26d52b55a4186a2f68c8/arrow-array/src/builder/primitive_dictionary_builder.rs#L90 Maybe we could refactor this to be something that's reusable by the IPC writer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org