gianm commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r582985342



##########
File path: 
processing/src/main/java/org/apache/druid/segment/vector/VectorColumnSelectorFactory.java
##########
@@ -62,16 +66,29 @@ default int getMaxVectorSize()
   SingleValueDimensionVectorSelector 
makeSingleValueDimensionSelector(DimensionSpec dimensionSpec);
 
   /**
-   * Returns a string-typed, multi-value-per-row column selector. Should only 
be called on columns where
-   * {@link #getColumnCapabilities} indicates they return STRING. Unlike 
{@link #makeSingleValueDimensionSelector},
-   * this should not be called on nonexistent columns.
+   * Returns a dictionary encoded, string-typed, multi-value-per-row column 
selector. Should only be called on columns
+   * where {@link #getColumnCapabilities} indicates they return STRING. Unlike
+   * {@link #makeSingleValueDimensionSelector}, this should not be called on 
nonexistent columns.
    *
    * If you need to write code that adapts to different input types, you 
should write a
    * {@link org.apache.druid.segment.VectorColumnProcessorFactory} and use one 
of the
    * {@link org.apache.druid.segment.ColumnProcessors#makeVectorProcessor} 
functions instead of using this method.
    */
   MultiValueDimensionVectorSelector 
makeMultiValueDimensionSelector(DimensionSpec dimensionSpec);
 
+  /**
+   * Returns a dimension object selector. Useful for non-dictionary encoded 
strings, or when using the dictionary
+   * isn't useful. Right now, this should probably only be called on single 
valued STRING inputs (multi-value STRING
+   * vector object selector is not yet implemented), but perhaps a world can 
be imagined where this could work with
+   * COMPLEX too perhaps if it were useful for them to interact with {@link 
DimensionSpec} in some manner? May be called
+   * for non-existent columns.

Review comment:
       These javadocs should explicitly constrain the type of Object that may 
show up in the returned VectorObjectSelector. Is it always going to be a String 
or a null? Might it be `String[]`, if the underlying column is multi-value?

##########
File path: 
processing/src/main/java/org/apache/druid/query/filter/DruidPredicateFactory.java
##########
@@ -32,4 +32,15 @@
   DruidFloatPredicate makeFloatPredicate();
 
   DruidDoublePredicate makeDoublePredicate();
+
+  /**
+   * Object predicate is currently only used by vectorized matchers for 
non-string object selectors. To preserve
+   * behavior with non-vectorized matchers which use a string predicate with 
null inputs for these 'nil' matchers,
+   * do the same thing here...
+   */
+  default Predicate<Object> makeObjectPredicate()

Review comment:
       Based on 
https://github.com/apache/druid/pull/10613#discussion_r578985027 it seems like 
the intent of this method is that it will only be called for complex types (or 
maybe nonexistent columns too). If that's the case, please add some javadoc 
explaining that this method will only be called for complex types. It's 
analogous to "makeObjectProcessor" in "VectorColumnProcessorFactory". (You 
could even `@see` to that method in the javadocs if you want.)

##########
File path: 
processing/src/main/java/org/apache/druid/segment/column/ColumnCapabilitiesImpl.java
##########
@@ -177,6 +177,21 @@ public static ColumnCapabilitiesImpl 
createSimpleNumericColumnCapabilities(Value
     return builder;
   }
 
+  /**
+   * Simple, non dictionary encoded string without bitmap index or anything 
fancy
+   */
+  public static ColumnCapabilitiesImpl createSimpleStringColumnCapabilities()
+  {
+    return new ColumnCapabilitiesImpl().setType(ValueType.STRING)
+                                       .setHasMultipleValues(false)

Review comment:
       Is it ok that hasMultipleValues is `false` here? That seems like the 
potentially-sketchiest of the behaviors, since someone might use this method 
for a multi-value string accidentally. How about naming it 
`createSimpleSingleValueStringColumnCapabilities`?
   
   Method names: the longer the better. Go, Java.

##########
File path: 
processing/src/main/java/org/apache/druid/segment/vector/VectorColumnSelectorFactory.java
##########
@@ -62,16 +66,29 @@ default int getMaxVectorSize()
   SingleValueDimensionVectorSelector 
makeSingleValueDimensionSelector(DimensionSpec dimensionSpec);
 
   /**
-   * Returns a string-typed, multi-value-per-row column selector. Should only 
be called on columns where
-   * {@link #getColumnCapabilities} indicates they return STRING. Unlike 
{@link #makeSingleValueDimensionSelector},
-   * this should not be called on nonexistent columns.
+   * Returns a dictionary encoded, string-typed, multi-value-per-row column 
selector. Should only be called on columns
+   * where {@link #getColumnCapabilities} indicates they return STRING. Unlike
+   * {@link #makeSingleValueDimensionSelector}, this should not be called on 
nonexistent columns.
    *
    * If you need to write code that adapts to different input types, you 
should write a
    * {@link org.apache.druid.segment.VectorColumnProcessorFactory} and use one 
of the
    * {@link org.apache.druid.segment.ColumnProcessors#makeVectorProcessor} 
functions instead of using this method.
    */
   MultiValueDimensionVectorSelector 
makeMultiValueDimensionSelector(DimensionSpec dimensionSpec);
 
+  /**
+   * Returns a dimension object selector. Useful for non-dictionary encoded 
strings, or when using the dictionary
+   * isn't useful. Right now, this should probably only be called on single 
valued STRING inputs (multi-value STRING

Review comment:
       Get rid of the "probably" 🙂
   
   Either it will work or it won't. If it doesn't work, say firmly not to do 
it, and ideally describe what will happen if you do it accidentally (the method 
will throw an error? the method will return a selector with undefined behavior? 
etc). If it does work, but might be a bad idea for some reason, spell that out.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to