clintropolis commented on code in PR #19599:
URL: https://github.com/apache/druid/pull/19599#discussion_r3453934815


##########
processing/src/main/java/org/apache/druid/segment/projections/ClusteringVectorColumnSelectorFactory.java:
##########
@@ -181,7 +181,21 @@ public ColumnCapabilities getColumnCapabilities(String 
column)
   {
     final int idx = clusteringColumns.indexOf(column);
     if (idx < 0) {
-      return delegate.getColumnCapabilities(column);
+      // Non-clustering columns are stored per cluster group with per-group 
local dictionaries that are NOT stable
+      // across the concatenating vector cursor. We must not advertise 
dictionary encoding: the vectorized group-by
+      // keys on the selector's (per-group-local) IDs when the column reports 
as dictionary-encoded
+      // (GroupByVectorColumnProcessorFactory#useDictionaryEncodedSelector), 
conflating distinct values across groups.
+      // Reporting non-dictionary-encoded routes it to the value-building 
vector selector, which is correct across
+      // groups.
+      final ColumnCapabilities delegateCapabilities = 
delegate.getColumnCapabilities(column);
+      if (delegateCapabilities == null) {
+        return null;
+      }
+      return ColumnCapabilitiesImpl.copyOf(delegateCapabilities)
+                                   .setDictionaryEncoded(false)

Review Comment:
   cardinality unknown doesn't make sense for the vector engine, dictionary 
encoded selectors are only used when the columns are actually dictionary 
encodable, and callers must check the column capabilities (though i see now 
that there aren't really any javadocs specifying this contract).
   
   I technically should remove the implementation since it will never be used, 
the main reason i left it is because later I'm going to have another pass at 
making this better and will likely need to bring an implementation back for 
that, but I guess I could just do it whenever I make the optimization instead 
of leaving dead code hanging around.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to