alamb commented on issue #7200:
URL: 
https://github.com/apache/arrow-datafusion/issues/7200#issuecomment-1690147397

   @JayjeetAtGithub  -- in terms of calculating "high cardinality" dictionaries 
perhaps we can use some sort of heuristic like "total number of distinct values 
used in the dictionary is greater than N" where "N" is a constant like `8` or 
`32` (maybe @tustvold  has some thoughts on the right values to use
   
   You can find the number of values used with this method: 
https://docs.rs/arrow/latest/arrow/array/struct.DictionaryArray.html#method.occupancy
   
   (and then compute the number of set bits)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to