clintropolis commented on code in PR #16338:
URL: https://github.com/apache/druid/pull/16338#discussion_r1643508840


##########
processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/vector/VectorGroupByEngine.java:
##########
@@ -138,12 +138,22 @@ public void close()
             try {
               final VectorColumnSelectorFactory columnSelectorFactory = 
cursor.getColumnSelectorFactory();
               final List<GroupByVectorColumnSelector> dimensions = 
query.getDimensions().stream().map(
-                  dimensionSpec ->
-                      ColumnProcessors.makeVectorProcessor(
+                  dimensionSpec -> {
+                    if (dimensionSpec instanceof DefaultDimensionSpec) {

Review Comment:
   is this just a sanity check since currently only `DefaultDimensionSpec` can 
vectorize, and ideally it will be the only one since `DimensionSpec` that do 
fancy stuff need to just be virtual columns.



##########
processing/src/main/java/org/apache/druid/segment/virtual/ExpressionDeferredGroupByVectorColumnSelector.java:
##########
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.virtual;
+
+import org.apache.datasketches.memory.WritableMemory;
+import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExpressionType;
+import org.apache.druid.math.expr.InputBindings;
+import org.apache.druid.query.groupby.ResultRow;
+import org.apache.druid.query.groupby.epinephelinae.collection.MemoryPointer;
+import 
org.apache.druid.query.groupby.epinephelinae.vector.GroupByVectorColumnSelector;
+import org.apache.druid.segment.column.RowSignature;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Implementation of {@link GroupByVectorColumnSelector} that uses a wide key 
representing all expression inputs
+ * to enable deferring expression evaluation to {@link 
#writeKeyToResultRow(MemoryPointer, int, ResultRow, int)}.
+ *
+ * For example, the expression "coalesce(x, y)" would write a key composed of 
(x, y) in {@link #writeKeys}, then
+ * compute "coalesce(x, y)" in {@link #writeKeyToResultRow}.
+ */
+public class ExpressionDeferredGroupByVectorColumnSelector implements 
GroupByVectorColumnSelector
+{
+  private final Expr expr;
+  private final List<GroupByVectorColumnSelector> subSelectors;
+  private final int exprKeyBytes;
+
+  /**
+   * Used internally by {@link #writeKeyToResultRow(MemoryPointer, int, 
ResultRow, int)} to populate inputs
+   * for the expression.
+   */
+  private final ResultRow tmpResultRow;
+
+  /**
+   * Used internally by {@link #writeKeyToResultRow(MemoryPointer, int, 
ResultRow, int)} to evaluate the expression
+   * on {@link #tmpResultRow}.
+   */
+  private final Expr.ObjectBinding tmpResultRowBindings;
+
+  ExpressionDeferredGroupByVectorColumnSelector(
+      final Expr expr,
+      final RowSignature exprInputSignature,
+      final List<GroupByVectorColumnSelector> subSelectors
+  )
+  {
+    this.expr = expr;
+    this.subSelectors = subSelectors;
+    this.exprKeyBytes = 
subSelectors.stream().mapToInt(GroupByVectorColumnSelector::getGroupingKeySize).sum();

Review Comment:
   nit: i think this should be the same size as `exprInputSignature.size()`? if 
true i think we should move this into that loop initializing the input bindings 
suppliers



##########
processing/src/main/java/org/apache/druid/query/groupby/DeferExpressionDimensions.java:
##########
@@ -0,0 +1,197 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.groupby;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonValue;
+import org.apache.druid.java.util.common.IAE;
+import org.apache.druid.math.expr.ExprType;
+import org.apache.druid.query.dimension.DimensionSpec;
+import org.apache.druid.segment.ColumnInspector;
+import org.apache.druid.segment.column.ColumnCapabilities;
+import org.apache.druid.segment.column.ValueType;
+import org.apache.druid.segment.vector.VectorColumnSelectorFactory;
+import org.apache.druid.segment.virtual.ExpressionPlan;
+import org.apache.druid.segment.virtual.ExpressionVirtualColumn;
+
+import java.util.List;
+
+/**
+ * Controls deferral of {@link ExpressionVirtualColumn} in {@link 
GroupByQuery}.
+ */
+public enum DeferExpressionDimensions
+{
+  SINGLE_STRING("singleString") {
+    @Override
+    public boolean useDeferredGroupBySelector(
+        ExpressionPlan plan,
+        List<String> requiredBindingsList,
+        ColumnInspector inspector
+    )
+    {
+      return false;
+    }
+  },
+
+  /**
+   * Defer expressions when their input variables are all fixed-width types 
(primitive numbers, or dictionary encoded).
+   */
+  FIXED_WIDTH("fixedWidth") {
+    @Override
+    public boolean useDeferredGroupBySelector(
+        ExpressionPlan plan,
+        List<String> requiredBindingsList,
+        ColumnInspector inspector
+    )
+    {
+      if (isInnatelyDeferrable(plan, requiredBindingsList, inspector)) {
+        return false;
+      }
+
+      for (final String requiredBinding : requiredBindingsList) {
+        final ColumnCapabilities capabilities = 
inspector.getColumnCapabilities(requiredBinding);
+        if (capabilities == null) {
+          return false;
+        }
+
+        if (!capabilities.isNumeric() && 
!capabilities.isDictionaryEncoded().isTrue()) {
+          // Not fixed-width.
+          return false;
+        }
+      }
+
+      return true;
+    }
+  },
+
+  /**
+   * Defer expressions when their input variables are all fixed-width types 
(primitive numbers, or dictionary encoded).
+   */
+  FIXED_WIDTH_NON_NUMERIC("fixedWidthNonNumeric") {
+    @Override
+    public boolean useDeferredGroupBySelector(
+        ExpressionPlan plan,
+        List<String> requiredBindingsList,
+        ColumnInspector inspector
+    )
+    {
+      if (isInnatelyDeferrable(plan, requiredBindingsList, inspector)) {
+        return false;
+      }
+
+      boolean allNumericInputs = true;
+
+      for (final String requiredBinding : requiredBindingsList) {
+        final ColumnCapabilities capabilities = 
inspector.getColumnCapabilities(requiredBinding);
+        if (capabilities == null) {
+          return false;
+        }
+
+        allNumericInputs = allNumericInputs && capabilities.isNumeric();
+
+        if (!capabilities.isNumeric() && 
!capabilities.isDictionaryEncoded().isTrue()) {

Review Comment:
   should this check `areDictionaryValuesUnique`?



##########
processing/src/main/java/org/apache/druid/query/groupby/DeferExpressionDimensions.java:
##########
@@ -0,0 +1,197 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.groupby;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonValue;
+import org.apache.druid.java.util.common.IAE;
+import org.apache.druid.math.expr.ExprType;
+import org.apache.druid.query.dimension.DimensionSpec;
+import org.apache.druid.segment.ColumnInspector;
+import org.apache.druid.segment.column.ColumnCapabilities;
+import org.apache.druid.segment.column.ValueType;
+import org.apache.druid.segment.vector.VectorColumnSelectorFactory;
+import org.apache.druid.segment.virtual.ExpressionPlan;
+import org.apache.druid.segment.virtual.ExpressionVirtualColumn;
+
+import java.util.List;
+
+/**
+ * Controls deferral of {@link ExpressionVirtualColumn} in {@link 
GroupByQuery}.
+ */
+public enum DeferExpressionDimensions
+{
+  SINGLE_STRING("singleString") {
+    @Override
+    public boolean useDeferredGroupBySelector(
+        ExpressionPlan plan,
+        List<String> requiredBindingsList,
+        ColumnInspector inspector
+    )
+    {
+      return false;
+    }
+  },
+
+  /**
+   * Defer expressions when their input variables are all fixed-width types 
(primitive numbers, or dictionary encoded).
+   */
+  FIXED_WIDTH("fixedWidth") {
+    @Override
+    public boolean useDeferredGroupBySelector(
+        ExpressionPlan plan,
+        List<String> requiredBindingsList,
+        ColumnInspector inspector
+    )
+    {
+      if (isInnatelyDeferrable(plan, requiredBindingsList, inspector)) {
+        return false;
+      }
+
+      for (final String requiredBinding : requiredBindingsList) {
+        final ColumnCapabilities capabilities = 
inspector.getColumnCapabilities(requiredBinding);
+        if (capabilities == null) {
+          return false;
+        }
+
+        if (!capabilities.isNumeric() && 
!capabilities.isDictionaryEncoded().isTrue()) {
+          // Not fixed-width.
+          return false;
+        }
+      }
+
+      return true;
+    }
+  },
+
+  /**
+   * Defer expressions when their input variables are all fixed-width types 
(primitive numbers, or dictionary encoded).
+   */
+  FIXED_WIDTH_NON_NUMERIC("fixedWidthNonNumeric") {
+    @Override
+    public boolean useDeferredGroupBySelector(
+        ExpressionPlan plan,
+        List<String> requiredBindingsList,
+        ColumnInspector inspector
+    )
+    {
+      if (isInnatelyDeferrable(plan, requiredBindingsList, inspector)) {
+        return false;
+      }
+
+      boolean allNumericInputs = true;
+
+      for (final String requiredBinding : requiredBindingsList) {
+        final ColumnCapabilities capabilities = 
inspector.getColumnCapabilities(requiredBinding);
+        if (capabilities == null) {
+          return false;
+        }
+
+        allNumericInputs = allNumericInputs && capabilities.isNumeric();
+
+        if (!capabilities.isNumeric() && 
!capabilities.isDictionaryEncoded().isTrue()) {
+          // Not fixed-width.
+          return false;
+        }
+      }
+
+      return !allNumericInputs || (plan.getOutputType() != null && 
!plan.getOutputType().isNumeric());
+    }
+  },
+
+  ALWAYS("always") {
+    @Override
+    public boolean useDeferredGroupBySelector(
+        ExpressionPlan plan,
+        List<String> requiredBindingsList,
+        ColumnInspector inspector
+    )
+    {
+      return !isInnatelyDeferrable(plan, requiredBindingsList, inspector);
+    }
+  };
+
+  public static final String JSON_KEY = "deferExpressionDimensions";
+
+  private final String jsonName;
+
+  DeferExpressionDimensions(String jsonName)
+  {
+    this.jsonName = jsonName;
+  }
+
+  @JsonCreator
+  public static DeferExpressionDimensions fromString(final String jsonName)
+  {
+    for (final DeferExpressionDimensions value : values()) {
+      if (value.jsonName.equals(jsonName)) {
+        return value;
+      }
+    }
+
+    throw new IAE("Invalid value[%s] for[%s]", jsonName, JSON_KEY);
+  }
+
+  public abstract boolean useDeferredGroupBySelector(
+      ExpressionPlan plan,
+      List<String> requiredBindingsList,
+      ColumnInspector inspector
+  );
+
+  @Override
+  @JsonValue
+  public String toString()
+  {
+    return jsonName;
+  }
+
+  /**
+   * Whether the given expression can be deferred innately by the selector 
created by
+   * {@link 
ExpressionVirtualColumn#makeSingleValueVectorDimensionSelector(DimensionSpec, 
VectorColumnSelectorFactory)}.
+   *
+   * In this case, all options for this enum return false from
+   * {@link #useDeferredGroupBySelector(ExpressionPlan, List, 
ColumnInspector)}, because there is no need to defer
+   * redundantly.
+   */
+  private static boolean isInnatelyDeferrable(
+      ExpressionPlan plan,
+      List<String> requiredBindingsList,
+      ColumnInspector inspector
+  )
+  {
+    if (plan.getOutputType() != null
+        && plan.getOutputType().is(ExprType.STRING)
+        && requiredBindingsList.size() <= 1) {
+      for (final String requiredBinding : requiredBindingsList) {
+        final ColumnCapabilities requiredBindingCapabilities = 
inspector.getColumnCapabilities(requiredBinding);
+
+        if (requiredBindingCapabilities == null
+            || !requiredBindingCapabilities.is(ValueType.STRING)
+            || !requiredBindingCapabilities.isDictionaryEncoded().isTrue()) {
+          return false;
+        }
+      }
+
+      return true;

Review Comment:
   does this also need to check `ColumnCapabilities.areDictionaryValuesUnique`?



##########
processing/src/main/java/org/apache/druid/query/groupby/DeferExpressionDimensions.java:
##########
@@ -0,0 +1,197 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.groupby;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonValue;
+import org.apache.druid.java.util.common.IAE;
+import org.apache.druid.math.expr.ExprType;
+import org.apache.druid.query.dimension.DimensionSpec;
+import org.apache.druid.segment.ColumnInspector;
+import org.apache.druid.segment.column.ColumnCapabilities;
+import org.apache.druid.segment.column.ValueType;
+import org.apache.druid.segment.vector.VectorColumnSelectorFactory;
+import org.apache.druid.segment.virtual.ExpressionPlan;
+import org.apache.druid.segment.virtual.ExpressionVirtualColumn;
+
+import java.util.List;
+
+/**
+ * Controls deferral of {@link ExpressionVirtualColumn} in {@link 
GroupByQuery}.
+ */
+public enum DeferExpressionDimensions
+{
+  SINGLE_STRING("singleString") {
+    @Override
+    public boolean useDeferredGroupBySelector(
+        ExpressionPlan plan,
+        List<String> requiredBindingsList,
+        ColumnInspector inspector
+    )
+    {
+      return false;
+    }
+  },
+
+  /**
+   * Defer expressions when their input variables are all fixed-width types 
(primitive numbers, or dictionary encoded).
+   */
+  FIXED_WIDTH("fixedWidth") {
+    @Override
+    public boolean useDeferredGroupBySelector(
+        ExpressionPlan plan,
+        List<String> requiredBindingsList,
+        ColumnInspector inspector
+    )
+    {
+      if (isInnatelyDeferrable(plan, requiredBindingsList, inspector)) {
+        return false;
+      }
+
+      for (final String requiredBinding : requiredBindingsList) {
+        final ColumnCapabilities capabilities = 
inspector.getColumnCapabilities(requiredBinding);
+        if (capabilities == null) {
+          return false;
+        }
+
+        if (!capabilities.isNumeric() && 
!capabilities.isDictionaryEncoded().isTrue()) {

Review Comment:
   should this check `areDictionaryValuesUnique`? Technically it probably 
doesn't have to since i think the only things that set this to false while 
retaining `isDictionaryEncoded` as true are non-default dimension specs like 
extractionFn which are not 1:1, but also not supported for use with 
vectorization.
   
   Also someday i'd like to have a cooler way to detect fixed width types that 
works for complex too, but that can wait for a future change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to