jihoonson commented on a change in pull request #10518:
URL: https://github.com/apache/druid/pull/10518#discussion_r532941216
##########
File path:
processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/RowBasedGrouperHelper.java
##########
@@ -576,7 +594,12 @@ private static ValueExtractFunction
makeValueExtractFunction(
// Add aggregations.
final int resultRowAggregatorStart =
query.getResultRowAggregatorStart();
for (int i = 0; i < entry.getValues().length; i++) {
- resultRow.set(resultRowAggregatorStart + i, entry.getValues()[i]);
+ if (dimsToInclude != null && groupingAggregatorsBitSet.get(i)) {
Review comment:
I think this `if` clause is probably fine. However, if I'm reading code
correctly, the new aggregators seem to do nothing and [even their result is not
in
use](https://github.com/apache/druid/pull/10518/files#diff-8f7ac0f08ac33a05571de6df8cc7d76932a284b94c1dec98e1b18aca7a0240bcR598)
but those aggregators are still involved in hash aggregation. I'm more
worrying about this because it involves serializing/deserializing the
aggregator values to/from off-heap memory which is pretty expensive. Because
`GroupingAggregatorFactory` is a special aggregator type which cannot be
computed by regular aggregation, can we rewrite the query to not compute them
in hash aggregation but add the aggregation results as what you do now?
> Right. This is one change I am a bit anxious about. is there any existing
benchmark I could use?
`GroupByBenchmark` will be the easiest place for such benchmarks, but it's
not probably the best place because it benchmarks the query performance of
historicals. I would like to suggest to add a new one for broker performance,
but this comment is not a blocker. I'm OK with adding a new benchmark in
`GroupByBenchmark` for now.
##########
File path:
processing/src/main/java/org/apache/druid/query/aggregation/GroupingAggregatorFactory.java
##########
@@ -0,0 +1,282 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.druid.annotations.EverythingIsNonnullByDefault;
+import org.apache.druid.query.aggregation.constant.LongConstantAggregator;
+import
org.apache.druid.query.aggregation.constant.LongConstantBufferAggregator;
+import
org.apache.druid.query.aggregation.constant.LongConstantVectorAggregator;
+import org.apache.druid.query.cache.CacheKeyBuilder;
+import org.apache.druid.segment.ColumnInspector;
+import org.apache.druid.segment.ColumnSelectorFactory;
+import org.apache.druid.segment.column.ValueType;
+import org.apache.druid.segment.vector.VectorColumnSelectorFactory;
+import org.apache.druid.utils.CollectionUtils;
+
+import javax.annotation.Nullable;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Objects;
+import java.util.Set;
+
+@EverythingIsNonnullByDefault
+public class GroupingAggregatorFactory extends AggregatorFactory
+{
+ private static final Comparator<Long> VALUE_COMPARATOR = Long::compare;
+ private final String name;
+ private final List<String> groupings;
+ private final long value;
+ @Nullable
+ private final Set<String> keyDimensions;
+
+ @JsonCreator
+ public GroupingAggregatorFactory(
+ @JsonProperty("name") String name,
+ @JsonProperty("groupings") List<String> groupings
+ )
+ {
+ this(name, groupings, null);
+ }
+
+ @VisibleForTesting
+ GroupingAggregatorFactory(
+ String name,
+ List<String> groupings,
+ @Nullable Set<String> keyDimensions
+ )
+ {
+ Preconditions.checkNotNull(name, "Must have a valid, non-null aggregator
name");
+ this.name = name;
+ this.groupings = groupings;
+ this.keyDimensions = keyDimensions;
+ value = groupingId(groupings, keyDimensions);
+ }
+
+ @Override
+ public Aggregator factorize(ColumnSelectorFactory metricFactory)
+ {
+ return new LongConstantAggregator(value);
+ }
+
+ @Override
+ public BufferAggregator factorizeBuffered(ColumnSelectorFactory
metricFactory)
+ {
+ return new LongConstantBufferAggregator(value);
+ }
+
+ @Override
+ public VectorAggregator factorizeVector(VectorColumnSelectorFactory
selectorFactory)
+ {
+ return new LongConstantVectorAggregator(value);
+ }
+
+ @Override
+ public boolean canVectorize(ColumnInspector columnInspector)
+ {
+ return true;
+ }
+
+ /**
+ * Replace the param {@code keyDimensions} with the new set of key dimensions
+ */
+ public GroupingAggregatorFactory withKeyDimensions(Set<String>
newKeyDimensions)
+ {
+ return new GroupingAggregatorFactory(name, groupings, newKeyDimensions);
+ }
+
+ @Override
+ public Comparator getComparator()
+ {
+ return VALUE_COMPARATOR;
+ }
+
+ @JsonProperty
+ public List<String> getGroupings()
+ {
+ return groupings;
+ }
+
+ @Override
+ @JsonProperty
+ public String getName()
+ {
+ return name;
+ }
+
+ public long getValue()
+ {
+ return value;
+ }
+
+ @Nullable
+ @Override
+ public Object combine(@Nullable Object lhs, @Nullable Object rhs)
+ {
+ if (null == lhs) {
+ return rhs;
+ }
+ return lhs;
+ }
+
+ @Override
+ public AggregatorFactory getCombiningFactory()
+ {
+ return new GroupingAggregatorFactory(name, groupings, keyDimensions);
+ }
+
+ @Override
+ public List<AggregatorFactory> getRequiredColumns()
+ {
+ return Collections.singletonList(new GroupingAggregatorFactory(name,
groupings, keyDimensions));
+ }
+
+ @Override
+ public Object deserialize(Object object)
+ {
+ return object;
+ }
+
+ @Nullable
+ @Override
+ public Object finalizeComputation(@Nullable Object object)
+ {
+ return object;
+ }
+
+ @Override
+ public List<String> requiredFields()
+ {
+ // The aggregator doesn't need to read any fields.
+ return Collections.emptyList();
+ }
+
+ @Override
+ public ValueType getType()
+ {
+ return ValueType.LONG;
+ }
+
+ @Override
+ public ValueType getFinalizedType()
+ {
+ return ValueType.LONG;
+ }
+
+ @Override
+ public int getMaxIntermediateSize()
+ {
+ return Long.BYTES;
+ }
+
+ @Override
+ public byte[] getCacheKey()
+ {
+ CacheKeyBuilder keyBuilder = new
CacheKeyBuilder(AggregatorUtil.GROUPING_CACHE_TYPE_ID)
+ .appendStrings(groupings);
+ if (null != keyDimensions) {
+ keyBuilder.appendStrings(keyDimensions);
+ }
+ return keyBuilder.build();
+ }
+
+ /**
+ * Gives the list of grouping dimensions, return a long value where each bit
at position X in the returned value
+ * corresponds to the dimension in groupings at same position X. X is the
position relative to the right end. if
+ * keyDimensions contain the grouping dimension at position X, the bit is
set to 1 at position X, otherwise it is
+ * set to 0. An example adapted from Microsoft SQL documentation
Review comment:
MySQL and Postgres do seem to take more than one expressions. Check out
the docs of
[postgres](https://www.postgresql.org/docs/13/functions-aggregate.html) and
[mysql](https://dev.mysql.com/doc/refman/8.0/en/miscellaneous-functions.html#function_grouping).
Their behaviour seems consistent across popular database systems such as
oracle and sql server. One exception is that sql server supports both
`grouping()` and `grouping_id()` which behaves in an opposite way to each
other. [Calcite's behaviour is also
consistent](https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql/fun/SqlGroupingFunction.java).
Perhaps you were looking at an old version of the comment. I think Druid's
behaviour should also match to most of other database systems.
##########
File path: docs/querying/aggregations.md
##########
@@ -426,3 +426,25 @@ This makes it possible to compute the results of a
filtered and an unfiltered ag
"aggregator" : <aggregation>
}
```
+
+### Grouping Aggregator
+
+A grouping aggregator can only be used as part of GroupBy queries which have a
subtotal spec. It returns a number for
+each output row that lets you infer whether a particular dimension is included
in the sub-grouping used for that row. You can pass
+a *non-empty* list of dimensions to this aggregator which *must* be a subset
of dimensions that you are grouping on.
+E.g if the aggregator has `["dim1", "dim2"]` as input dimensions and
`[["dim1", "dim2"], ["dim1"], ["dim2"], []]` as subtotals,
+following can be the possible output of the aggregator
+
+| subtotal used in query | Output | (bits representation) |
+|------------------------|--------|-----------------------|
+| `["dim1", "dim2"]` | 3 | (11) |
+| `["dim1"]` | 2 | (10) |
+| `["dim2"]` | 1 | (01) |
+| `[]` | 0 | (00) |
+
+As illustrated in above example, output number can be though of as an unsigned
n bit number where n is the number of dimensions passed to the aggregator.
Review comment:
typo: though -> thought
##########
File path:
processing/src/main/java/org/apache/druid/query/aggregation/constant/LongConstantAggregator.java
##########
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.constant;
+
+import org.apache.druid.query.aggregation.Aggregator;
+
+public class LongConstantAggregator implements Aggregator
Review comment:
If we rewrite the query to not include `GroupingAggregatorFactory` as I
commented below, these aggregators won't be no longer in use.
`GroupingAggregatorFactory` can throw an exception instead when `factorize()`
is called.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]