clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r553814648
##########
File path:
processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java
##########
@@ -305,17 +305,29 @@ private static ColumnCapabilities
getEffectiveCapabilities(
originalCapabilities == null ||
ValueType.COMPLEX.equals(originalCapabilities.getType());
if (type == ValueType.STRING) {
- if (!forceSingleValue &&
effectiveCapabilites.hasMultipleValues().isMaybeTrue()) {
- return strategyFactory.makeMultiValueDimensionProcessor(
- effectiveCapabilites,
- selectorFactory.makeMultiValueDimensionSelector(dimensionSpec)
- );
- } else {
- return strategyFactory.makeSingleValueDimensionProcessor(
- effectiveCapabilites,
- selectorFactory.makeSingleValueDimensionSelector(dimensionSpec)
- );
+ if (!forceSingleValue) {
+ // if column is not dictionary encoded (and not non-existent or
complex), use object selector
+ if (effectiveCapabilites.isDictionaryEncoded().isFalse() ||
+ effectiveCapabilites.areDictionaryValuesUnique().isFalse()
Review comment:
by regular string selector do you mean dimension selector? The answer is
yes we _could_, but if the dictionary values are not unique, i came to the
conclusion that they aren't especially useful since in all the cases I can
think of you can't use them as is and you're going to have to lookup the string
values anyway (e.g. filter matching and vector group by).
I was trying to provide an avenue to avoid the extra steps with expression
virtual columns, and allow an object selector to be created directly since the
string is the useable part in this case, and the dictionary ids just feel like
an obstruction. But maybe this is overly broad and there are cases where we
would want it to act like a dictionary encoded column for some reason that I
have missed?
##########
File path:
processing/src/main/java/org/apache/druid/segment/filter/ExpressionFilter.java
##########
@@ -55,6 +69,105 @@ public ExpressionFilter(final Supplier<Expr> expr, final
FilterTuning filterTuni
this.filterTuning = filterTuning;
}
+ @Override
+ public boolean canVectorizeMatcher(ColumnInspector inspector)
+ {
+ return expr.get().canVectorize(inspector);
+ }
+
+ @Override
+ public VectorValueMatcher makeVectorMatcher(VectorColumnSelectorFactory
factory)
+ {
+ final Expr theExpr = expr.get();
+
+ DruidPredicateFactory predicateFactory = new DruidPredicateFactory()
+ {
+ @Override
+ public Predicate<String> makeStringPredicate()
+ {
+ return Evals::asBoolean;
+ }
+
+ @Override
+ public DruidLongPredicate makeLongPredicate()
+ {
+ return Evals::asBoolean;
+ }
+
+ @Override
+ public DruidFloatPredicate makeFloatPredicate()
+ {
+ return Evals::asBoolean;
+ }
+
+ @Override
+ public DruidDoublePredicate makeDoublePredicate()
+ {
+ return Evals::asBoolean;
+ }
+
+ // The hashcode and equals are to make
SubclassesMustOverrideEqualsAndHashCodeTest stop complaining..
+ // DruidPredicateFactory doesn't really need equals or hashcode, in fact
only the 'toString' method is called
Review comment:
yeah, I was planning on addressing this in a separate PR, the comments
are partially to help remind us to do it in case I didn't get to it immediately
##########
File path:
processing/src/main/java/org/apache/druid/segment/filter/ExpressionFilter.java
##########
@@ -55,6 +69,105 @@ public ExpressionFilter(final Supplier<Expr> expr, final
FilterTuning filterTuni
this.filterTuning = filterTuning;
}
+ @Override
+ public boolean canVectorizeMatcher(ColumnInspector inspector)
+ {
+ return expr.get().canVectorize(inspector);
+ }
+
+ @Override
+ public VectorValueMatcher makeVectorMatcher(VectorColumnSelectorFactory
factory)
+ {
+ final Expr theExpr = expr.get();
+
+ DruidPredicateFactory predicateFactory = new DruidPredicateFactory()
+ {
+ @Override
+ public Predicate<String> makeStringPredicate()
+ {
+ return Evals::asBoolean;
+ }
+
+ @Override
+ public DruidLongPredicate makeLongPredicate()
+ {
+ return Evals::asBoolean;
+ }
+
+ @Override
+ public DruidFloatPredicate makeFloatPredicate()
+ {
+ return Evals::asBoolean;
+ }
+
+ @Override
+ public DruidDoublePredicate makeDoublePredicate()
+ {
+ return Evals::asBoolean;
+ }
+
+ // The hashcode and equals are to make
SubclassesMustOverrideEqualsAndHashCodeTest stop complaining..
+ // DruidPredicateFactory doesn't really need equals or hashcode, in fact
only the 'toString' method is called
+ // when testing equality of DimensionPredicateFilter, so it's the truly
required method, but even that seems
+ // strange at best.
+ // Rather than tackle removing the annotation and reworking equality
tests for now, will leave this to refactor
+ // as part of https://github.com/apache/druid/issues/8256 which suggests
combining Filter and DimFilter
+ // implementations, which should clean up some of this mess.
+ @Override
+ public int hashCode()
+ {
+ return super.hashCode();
+ }
+
+ @Override
+ public boolean equals(Object obj)
+ {
+ return super.equals(obj);
+ }
+ };
+
+
+ final ExprType outputType = theExpr.getOutputType(factory);
+
+ if (outputType == null) {
+ // if an expression is vectorizable, but the output type is null, the
result will be null (or the default
+ // value in default mode) because expression is either all null
constants or missing columns
+
+ // in sql compatible mode, this means no matches ever, so just use the
false matcher:
+ if (NullHandling.sqlCompatible()) {
+ return new FalseVectorMatcher(factory.getVectorSizeInspector());
+ }
+ // in default mode, just fallback to using a long matcher since nearly
all boolean-ish expressions
+ // output a long value so it is probably a safe bet? idk, ending up here
by using all null-ish things
+ // in default mode is dancing on the edge of insanity anyway...
+ return
VectorValueMatcherColumnProcessorFactory.instance().makeLongProcessor(
+
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.LONG),
+ ExpressionVectorSelectors.makeVectorValueSelector(factory, theExpr)
+ ).makeMatcher(predicateFactory);
+ }
+
+ switch (outputType) {
+ case LONG:
+ return
VectorValueMatcherColumnProcessorFactory.instance().makeLongProcessor(
+
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.LONG),
+ ExpressionVectorSelectors.makeVectorValueSelector(factory, theExpr)
+ ).makeMatcher(predicateFactory);
+ case DOUBLE:
+ return
VectorValueMatcherColumnProcessorFactory.instance().makeDoubleProcessor(
+
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.DOUBLE),
+ ExpressionVectorSelectors.makeVectorValueSelector(factory, theExpr)
+ ).makeMatcher(predicateFactory);
+ case STRING:
+ return
VectorValueMatcherColumnProcessorFactory.instance().makeObjectProcessor(
+ // using 'numeric' capabilities creator so we are configured to
NOT be dictionary encoded, etc
+
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.STRING),
Review comment:
it is strange, but on purpose for the reason the comment says since the
numeric capabilities defaults are correct in this case. I will make sure all
current users are numeric types, and if so then just make these capabilites
from scratch because we should be counting on something geared towards numeric
behavior in case those change in some unexpected way. Or, if it is used by
other non-numbers then rename it to something more appropriate, like
`createSimpleSingleValueColumnCapabilites` or similar
##########
File path:
processing/src/main/java/org/apache/druid/segment/vector/VectorColumnSelectorFactory.java
##########
@@ -57,6 +57,15 @@ default int getMaxVectorSize()
*/
MultiValueDimensionVectorSelector
makeMultiValueDimensionSelector(DimensionSpec dimensionSpec);
+ /**
+ * Returns an object selector for string columns, useful for non-dictionary
encoded strings, or when
Review comment:
hmm, i have no idea why I named this function like this, it should
probably be called `makeDimensionObjectSelector`. Its just expected to make a
`VectorObjectSelector` from a `DimensionSpec` rather than a raw column name
##########
File path:
processing/src/main/java/org/apache/druid/query/filter/DruidPredicateFactory.java
##########
@@ -27,6 +27,13 @@
{
Predicate<String> makeStringPredicate();
+ default Predicate<Object> makeObjectPredicate()
+ {
+ // default to try to use string predicate;
+ final Predicate<String> stringPredicate = makeStringPredicate();
+ return o -> stringPredicate.apply((String) o);
Review comment:
uh, it is sort of built on hope right now... the hope that it will
probably be a string because the only context it will currently happen in is
from making vector matcher object processors on string expression columns, but
if we opened it up to additional non-primitive number types like array
expressions and complex column types then this would need to be reconsidered,
or stronger checks in place that the filter supports filtering on the
underlying object type. I might need to think about this a bit deeper.
##########
File path:
processing/src/main/java/org/apache/druid/query/filter/DruidPredicateFactory.java
##########
@@ -27,6 +27,13 @@
{
Predicate<String> makeStringPredicate();
+ default Predicate<Object> makeObjectPredicate()
+ {
+ // default to try to use string predicate;
Review comment:
currently I don't think it should be possible because only expression
filter will make a matcher factory with object matching predicate and only for
string column types, but i guess cast exceptions if an object selector of a
complex column somehow got in here from future code, see other comment about
thinking a bit further about this 😅
##########
File path:
processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/vector/GroupByVectorColumnProcessorFactory.java
##########
@@ -64,7 +65,16 @@ public GroupByVectorColumnSelector
makeMultiValueDimensionProcessor(
ValueType.STRING == capabilities.getType(),
"groupBy dimension processors must be STRING typed"
);
- throw new UnsupportedOperationException("Multi-value dimensions not yet
implemented for vectorized groupBys");
+ throw new UnsupportedOperationException("Multi-value dimensions are not
yet implemented for vectorized groupBys");
+ }
+
+ @Override
+ public GroupByVectorColumnSelector makeObjectProcessor(
+ ColumnCapabilities capabilities,
+ VectorObjectSelector selector
+ )
+ {
+ throw new UnsupportedOperationException("Object columns are not yet
implemented for vectorized groupBys");
Review comment:
yes, I have a related branch that I will open as a follow-up to this PR
that adds support for group-by queries on vectorized string expressions, using
an object selector and a newly added dictionary building
`GroupByVectorColumnSelector` similar to the non-vectorized group by code-path.
In that branch this part checks that the column capabilities are string typed,
and if so then will make the new dictionary building selector, else throw an
exception for being an unsupported type.
It seems beneficial to not have to make the string expression vector results
have to masquerade as dictionary encoded, to be translated to string values to
be re-encoded into a new dictionary, which is part of my reason for pushing for
object selectors for the use case
##########
File path:
processing/src/main/java/org/apache/druid/query/dimension/DimensionSpec.java
##########
@@ -70,6 +71,11 @@ default MultiValueDimensionVectorSelector
decorate(MultiValueDimensionVectorSele
throw new UOE("DimensionSpec[%s] cannot vectorize", getClass().getName());
}
+ default VectorObjectSelector decorate(VectorObjectSelector selector)
Review comment:
This is used with the poorly named
`VectorColumnSelectorFactory.makeStringObjectSelector`, which should be renamed
`VectorColumnSelectorFactory.makeDimensionObjectSelector` or something, which
is for doing dimensiony things with a vector object selector similar to what is
done for `SingleValueDimensionVectorSelector` or
`MultiValueDimensionVectorSelector`. There are currently no implementations of
this, like is the case with the other decorate methods with vector selectors, i
was just trying for symmetry here so that the object selector could behave
similar to the other string selectors when used in this context
##########
File path:
core/src/main/java/org/apache/druid/math/expr/BinaryLogicalOperatorExpr.java
##########
@@ -74,7 +74,7 @@ public ExprType getOutputType(InputBindingInspector inspector)
@Override
public boolean canVectorize(InputBindingInspector inspector)
{
- return inspector.areNumeric(left, right) && inspector.canVectorize(left,
right);
Review comment:
i _think_ canVectorize will still end up false because array
types/functions are not yet vectorizable, but yeah i guess array types would
not be well handled here even if they were, though maybe that is the case for
non-vectorized expression as well. I'll do a bit of exploration and see what up
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]