[GitHub] [pinot] siddharthteotia commented on pull request #8398: Allow disabling dict generation for High cardinality columns

GitBox Thu, 24 Mar 2022 21:31:47 -0700


siddharthteotia commented on pull request #8398:
URL: https://github.com/apache/pinot/pull/8398#issuecomment-1078644639



   So we had implemented a config recommendation rule in the 
RecommendationEngine (used by LinkedIn)
   
   
https://github.com/apache/pinot/blob/master/pinot-controller/src/main/java/org/apache/pinot/controller/recommender/rules/impl/NoDictionaryOnHeapDictionaryJointRule.java
   
   It needs to be improved based on some of the things we have observed after 
using it for quite some time
   
   - First is that for pure aggregation only queries, they get slowed down 
significantly (3ms v/s 300ms) if dictionary is not created on the column -- 
because MIN, MAX aggregations can be answered from dictionary as opposed to 
scanning table
   
   - Columns that are in SELECT list benefit without dictionary because during 
projection, noDictionary avoids the extra hop from forward index to dictionary. 
In some cases, we saw 20% performance improvement for such scenarios by not 
having dictionary
   
   - Lastly, as also mentioned in this PR -- for low cardinality storage 
savings can be significant but regardless of cardinality, and especially for 
STRING columns predicate evaluation / native arithmetic is faster on dictionary 
codes than varchar /string comparison
   
   We find ourselves recommending noDictionary too aggressively and need to 
balance the above requirements in the rule in our recommendation engine.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [pinot] siddharthteotia commented on pull request #8398: Allow disabling dict generation for High cardinality columns

Reply via email to