[GitHub] [druid] clintropolis commented on pull request #12291: adjust topn heap operation when string is dictionary encoded, but not uniquely

GitBox Tue, 08 Mar 2022 11:49:47 -0800


clintropolis commented on pull request #12291:
URL: https://github.com/apache/druid/pull/12291#issuecomment-1062142843



   > * string dimension, no dictionary <-- i.e., what we get from an expression 
that isn't backed by a single dictionary-coded string column
   
   this one uses scanAndAggregateWithCardinalityUnknown because those columns 
which are  `1:*` or `*:*` should not report themselves as dictionary 
encoded/name lookup possible in advance
   
   > * string dimension, dictionary coded, unique (1-1 mapping from keys -> 
values) <-- i.e., what we get from a regular column in a segment, or a 
dictionary-coded string column plus an ExtractionFn that is ONE_TO_ONE
   > * string dimension, dictionary coded, nonunique <-- i.e., what we get from 
an expression backed by a single dictionary-coded string column, or a 
dictionary-coded string column plus an ExtractionFn that is MANY_TO_ONE, or an 
IndexedTable
   
   these two cases use `scanAndAggregateWithCardinalityKnown`, prior to this 
patch the latter case used `scanAndAggregateWithCardinalityUnknown`, which is 
better for `IndexedTable` (since the dictionaryIds never repeat so always has 
to still perform the value lookupName and hash table lookup), but far worse for 
expressions and lookups, which never report their dictionaries as unique 
currently and potentially have many repeated dictionary ids.
   
   Should I do something to clear up the javadocs?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] clintropolis commented on pull request #12291: adjust topn heap operation when string is dictionary encoded, but not uniquely

Reply via email to