kdkavanagh commented on issue #47151: URL: https://github.com/apache/arrow/issues/47151#issuecomment-3168381924
Agree that option 1 is far better than 2 for performance reasons (the whole reason to prepopulate / `appendIndices`). If the `memo_table_` wasnt populated but the underlying dictionary array was (manually) populated, what would happen if the user invoked the regular `append()` instead of `appendIndices()` on the outer `DictionaryBuilder`? Perhaps a happy medium is that `InsertMemoValues` tolerates duplicates and adds them all to the underlying dictionary array, but then only inserts unique values into the memo_table datastructure? Subsequent calls to `append()` would effectively use the "first" value->index mapping (the one that was stored in the memo_table), but direct calls to `appendIndices()` would still map directly to what the user provided to `InsertMemoValues` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
