ella-chao opened a new issue, #38988:
URL: https://github.com/apache/arrow/issues/38988

   ### Describe the enhancement requested
   
   I have a case where knowing the size of the dictionary as values get 
appended to the dictionary builder will be useful. Specifically, I am indexing 
data where the number of unique values is unknown. As the number of unique 
values is more likely to be relatively small in this case, a 
`BinaryDictionaryBuilder` is used and only when it is detected that the 
dictionary will be too big do I fall back to a `LargeStringBuilder`.
   
   The issue is that there is no easy way to figure out the size of the 
dictionary in a `BinaryDictionaryBuilder` today. As a workaround, after each 
`AppendString` to the `BinaryDictionaryBuilder` I do the following
   ```
   lastDictIndex := 
bldrDictString.(*arrowarray.BinaryDictionaryBuilder).GetValueIndex(i)
   if lastDictIndex+1 > cardinality {
       cardinality = lastDictIndex + 1
   }
   ```
   where `i` is the index of the value appended.
   
   It would be more convenient and potentially less costly if the `MemoTable` 
or even just the size of the dictionary is exposed. Do you think this is 
something that you will be open to? I will be happy to open a PR if so.
   
   
   
   ### Component(s)
   
   Go


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to