[GitHub] [pinot] KKcorps commented on pull request #9527: Do not create dictionary for high-cardinality columns

GitBox Wed, 02 Nov 2022 02:00:29 -0700


KKcorps commented on PR #9527:
URL: https://github.com/apache/pinot/pull/9527#issuecomment-1299884938


   > I can see that for json and text column, we might not want to create 
dictionary, but for other dimensions, in most cases we still want to create 
dictionaries, or a lot of indexes cannot be applied.
   > With the current change, for existing users who have optimize dictionary 
set for metrics, this will automatically apply that to dimensions, which can 
cause serious regression (inverted index cannot be added).
   > How about adding a config to only apply this to json/text column?
   
   Actually the reason for this change was to introduce this config for 
dimension columns (after complaints about space amplification and memory usage 
from users). 
   Json and text index got introduced later in the scope. 
   IMO, what we can do though is then introduce a seperate metric 
`optimizeDictionaryForDimensions` but mention the risk with setting this 
config. 
   
   users do have cases where they keep String columns as dimensions but don't 
really do any filtering on top of them. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [pinot] KKcorps commented on pull request #9527: Do not create dictionary for high-cardinality columns

Reply via email to