brancz opened a new issue, #37039:
URL: https://github.com/apache/arrow/issues/37039

   ### Describe the enhancement requested
   
   We have a case where we're getting multiple dictionary arrays streamed and 
have to build one "final" dictionary. The unifier doesn't really work for us as 
the builder is not "done" in intermediate appends so we don't have two arrays 
to unify.
   
   So what I was thinking was to build translation tables in each intermediate 
step by calling `GetOrInsert` on the memo table. So we don't have to exercise 
the memo-table lookups on each value, but rather just insert the indices 
directly. (profiling data shows clearly that the memotable lookups are very 
expensive but we know that our workload heavily deduplicates with dictionaries)
   
   Would exposing the `*hashing.BinaryMemoTable` on the 
`*array.BinaryDictionaryBuilder` be accepted? Happy to do the patch, but wanted 
to open the issue to discuss before.
   
   cc @zero
   
   ### Component(s)
   
   Go


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to