[GitHub] [druid] maytasm commented on a change in pull request #10053: fix topn on string columns with non-sorted or non-unique dictionaries

GitBox Thu, 18 Jun 2020 20:08:32 -0700


maytasm commented on a change in pull request #10053:
URL: https://github.com/apache/druid/pull/10053#discussion_r442605486




##########
File path: 
processing/src/main/java/org/apache/druid/query/topn/TopNQueryEngine.java
##########
@@ -126,14 +126,20 @@ private TopNMapFn getMapFn(
         // Once we have arbitrary dimension types following check should be 
replaced by checking
         // that the column is of type long and single-value.
         dimension.equals(ColumnHolder.TIME_COLUMN_NAME)
-        ) {
+    ) {
       // A special TimeExtractionTopNAlgorithm is required, since 
DimExtractionTopNAlgorithm
       // currently relies on the dimension cardinality to support 
lexicographic sorting
       topNAlgorithm = new TimeExtractionTopNAlgorithm(adapter, query);
     } else if (selector.isHasExtractionFn()) {
       topNAlgorithm = new HeapBasedTopNAlgorithm(adapter, query);
-    } else if (columnCapabilities == null || !(columnCapabilities.getType() == 
ValueType.STRING
-                                               && 
columnCapabilities.isDictionaryEncoded())) {
+    } else if (
+        columnCapabilities == null ||
+        !(columnCapabilities.getType() == ValueType.STRING &&
+          columnCapabilities.isDictionaryEncoded() &&
+          columnCapabilities.areDictionaryValuesSorted().isTrue() &&
+          columnCapabilities.areDictionaryValuesUnique().isTrue()
+        )
+    ) {
       // Use HeapBasedTopNAlgorithm for non-Strings and for 
non-dictionary-encoded Strings, and for things we don't know

Review comment:
       Can you update the comments and also mention why we can/want to use 
HeapBasedTopNAlgorithm for each of those conditions (not sorted, not unique, 
etc)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] maytasm commented on a change in pull request #10053: fix topn on string columns with non-sorted or non-unique dictionaries

Reply via email to