brancz opened a new issue, #37039: URL: https://github.com/apache/arrow/issues/37039
### Describe the enhancement requested We have a case where we're getting multiple dictionary arrays streamed and have to build one "final" dictionary. The unifier doesn't really work for us as the builder is not "done" in intermediate appends so we don't have two arrays to unify. So what I was thinking was to build translation tables in each intermediate step by calling `GetOrInsert` on the memo table. So we don't have to exercise the memo-table lookups on each value, but rather just insert the indices directly. (profiling data shows clearly that the memotable lookups are very expensive but we know that our workload heavily deduplicates with dictionaries) Would exposing the `*hashing.BinaryMemoTable` on the `*array.BinaryDictionaryBuilder` be accepted? Happy to do the patch, but wanted to open the issue to discuss before. cc @zero ### Component(s) Go -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
