alamb commented on issue #7200: URL: https://github.com/apache/arrow-datafusion/issues/7200#issuecomment-1690147397
@JayjeetAtGithub -- in terms of calculating "high cardinality" dictionaries perhaps we can use some sort of heuristic like "total number of distinct values used in the dictionary is greater than N" where "N" is a constant like `8` or `32` (maybe @tustvold has some thoughts on the right values to use You can find the number of values used with this method: https://docs.rs/arrow/latest/arrow/array/struct.DictionaryArray.html#method.occupancy (and then compute the number of set bits) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
