kdkavanagh commented on issue #47151:
URL: https://github.com/apache/arrow/issues/47151#issuecomment-3168381924

   Agree that option 1 is far better than 2 for performance reasons (the whole 
reason to prepopulate / `appendIndices`).
   
   If the `memo_table_` wasnt populated but the underlying dictionary array was 
(manually) populated, what would happen if the user invoked the regular 
`append()` instead of `appendIndices()` on the outer `DictionaryBuilder`? 
   
   Perhaps a happy medium is that `InsertMemoValues` tolerates duplicates and 
adds them all to the underlying dictionary array, but then only inserts unique 
values into the memo_table datastructure? Subsequent calls to `append()` would 
effectively use the "first" value->index mapping (the one that was stored in 
the memo_table), but direct calls to `appendIndices()` would still map directly 
to what the user provided to `InsertMemoValues`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to