jihoonson commented on a change in pull request #10518: URL: https://github.com/apache/druid/pull/10518#discussion_r535700672
########## File path: processing/src/main/java/org/apache/druid/query/aggregation/GroupingAggregatorFactory.java ########## @@ -0,0 +1,282 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.query.aggregation; + +import com.fasterxml.jackson.annotation.JsonCreator; +import com.fasterxml.jackson.annotation.JsonProperty; +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Preconditions; +import org.apache.druid.annotations.EverythingIsNonnullByDefault; +import org.apache.druid.query.aggregation.constant.LongConstantAggregator; +import org.apache.druid.query.aggregation.constant.LongConstantBufferAggregator; +import org.apache.druid.query.aggregation.constant.LongConstantVectorAggregator; +import org.apache.druid.query.cache.CacheKeyBuilder; +import org.apache.druid.segment.ColumnInspector; +import org.apache.druid.segment.ColumnSelectorFactory; +import org.apache.druid.segment.column.ValueType; +import org.apache.druid.segment.vector.VectorColumnSelectorFactory; +import org.apache.druid.utils.CollectionUtils; + +import javax.annotation.Nullable; +import java.util.Collections; +import java.util.Comparator; +import java.util.List; +import java.util.Objects; +import java.util.Set; + +@EverythingIsNonnullByDefault +public class GroupingAggregatorFactory extends AggregatorFactory Review comment: Please add some javadoc. It would be nice to include but not necessarily limited to the followings. - This factory is for computing `grouping` function. - This factory can create `LongConstant*Aggregators`. Unlike other aggregators, these `LongConstant*Aggregators` created by this factory does nothing with computing `grouping` function. Instead, they are used just to hold the positions of `grouping` results in the `ResultRow`. - The actual computation of `grouping` function is done before processing each subtotal. The result of `LongConstant*Aggregators` is _not_ used but replaced with the precomputed result when iterating the result of subtotal computation. See `RowBasedGrouperHelper.makeGrouperIterator()` for more details. - There could be some different approach to implement the same functionality. We chose this approach because it seems more stable and less complex than others. See https://github.com/apache/druid/pull/10518#discussion_r532941216 for more details. ########## File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/RowBasedGrouperHelper.java ########## @@ -576,7 +594,12 @@ private static ValueExtractFunction makeValueExtractFunction( // Add aggregations. final int resultRowAggregatorStart = query.getResultRowAggregatorStart(); for (int i = 0; i < entry.getValues().length; i++) { - resultRow.set(resultRowAggregatorStart + i, entry.getValues()[i]); + if (dimsToInclude != null && groupingAggregatorsBitSet.get(i)) { + resultRow.set(resultRowAggregatorStart + i, groupingAggregatorValues[i]); Review comment: Please add some comment about what is happening here. ########## File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/RowBasedGrouperHelper.java ########## @@ -576,7 +594,12 @@ private static ValueExtractFunction makeValueExtractFunction( // Add aggregations. final int resultRowAggregatorStart = query.getResultRowAggregatorStart(); for (int i = 0; i < entry.getValues().length; i++) { - resultRow.set(resultRowAggregatorStart + i, entry.getValues()[i]); + if (dimsToInclude != null && groupingAggregatorsBitSet.get(i)) { Review comment: I talked to @abhishekagarwal87 offline. My biggest concern is that the special handling for `GroupingAggregatorFactory` seems pretty magical because the behaviour of the factory and its aggregators is different from others. What I suggested above is making it more special but less magical, because I think it's less confusing. @abhishekagarwal87's concern is mostly around the complexity of query rewriting which requires adjusting result row signature (because the result of `grouping` function will be missing at certain points during a query after rewrite). I think we can still handle this but maybe it could be fragile because we don't have a systemic way to handle result row signature changes during a query and thus the logic to handle them will be ad-hoc. I agree with this view, so the current structure seems reasonable even though I still think, ideally, we should not involve `GroupingAggregatorFactory` in hash aggregation. Maybe we can do in the future once we have some bett er way to handle query writing and result row signature changes. ########## File path: processing/src/main/java/org/apache/druid/query/aggregation/GroupingAggregatorFactory.java ########## @@ -0,0 +1,282 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.query.aggregation; + +import com.fasterxml.jackson.annotation.JsonCreator; +import com.fasterxml.jackson.annotation.JsonProperty; +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Preconditions; +import org.apache.druid.annotations.EverythingIsNonnullByDefault; +import org.apache.druid.query.aggregation.constant.LongConstantAggregator; +import org.apache.druid.query.aggregation.constant.LongConstantBufferAggregator; +import org.apache.druid.query.aggregation.constant.LongConstantVectorAggregator; +import org.apache.druid.query.cache.CacheKeyBuilder; +import org.apache.druid.segment.ColumnInspector; +import org.apache.druid.segment.ColumnSelectorFactory; +import org.apache.druid.segment.column.ValueType; +import org.apache.druid.segment.vector.VectorColumnSelectorFactory; +import org.apache.druid.utils.CollectionUtils; + +import javax.annotation.Nullable; +import java.util.Collections; +import java.util.Comparator; +import java.util.List; +import java.util.Objects; +import java.util.Set; + +@EverythingIsNonnullByDefault +public class GroupingAggregatorFactory extends AggregatorFactory +{ + private static final Comparator<Long> VALUE_COMPARATOR = Long::compare; + private final String name; + private final List<String> groupings; + private final long value; + @Nullable + private final Set<String> keyDimensions; + + @JsonCreator + public GroupingAggregatorFactory( + @JsonProperty("name") String name, + @JsonProperty("groupings") List<String> groupings + ) + { + this(name, groupings, null); + } + + @VisibleForTesting + GroupingAggregatorFactory( + String name, + List<String> groupings, + @Nullable Set<String> keyDimensions + ) + { + Preconditions.checkNotNull(name, "Must have a valid, non-null aggregator name"); + this.name = name; + this.groupings = groupings; + this.keyDimensions = keyDimensions; + value = groupingId(groupings, keyDimensions); + } + + @Override + public Aggregator factorize(ColumnSelectorFactory metricFactory) + { + return new LongConstantAggregator(value); + } + + @Override + public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory) + { + return new LongConstantBufferAggregator(value); + } + + @Override + public VectorAggregator factorizeVector(VectorColumnSelectorFactory selectorFactory) + { + return new LongConstantVectorAggregator(value); + } + + @Override + public boolean canVectorize(ColumnInspector columnInspector) + { + return true; + } + + /** + * Replace the param {@code keyDimensions} with the new set of key dimensions + */ + public GroupingAggregatorFactory withKeyDimensions(Set<String> newKeyDimensions) + { + return new GroupingAggregatorFactory(name, groupings, newKeyDimensions); + } + + @Override + public Comparator getComparator() + { + return VALUE_COMPARATOR; + } + + @JsonProperty + public List<String> getGroupings() + { + return groupings; + } + + @Override + @JsonProperty + public String getName() + { + return name; + } + + public long getValue() + { + return value; + } + + @Nullable + @Override + public Object combine(@Nullable Object lhs, @Nullable Object rhs) + { + if (null == lhs) { + return rhs; + } + return lhs; + } + + @Override + public AggregatorFactory getCombiningFactory() + { + return new GroupingAggregatorFactory(name, groupings, keyDimensions); + } + + @Override + public List<AggregatorFactory> getRequiredColumns() + { + return Collections.singletonList(new GroupingAggregatorFactory(name, groupings, keyDimensions)); + } + + @Override + public Object deserialize(Object object) + { + return object; + } + + @Nullable + @Override + public Object finalizeComputation(@Nullable Object object) + { + return object; + } + + @Override + public List<String> requiredFields() + { + // The aggregator doesn't need to read any fields. + return Collections.emptyList(); + } + + @Override + public ValueType getType() + { + return ValueType.LONG; + } + + @Override + public ValueType getFinalizedType() + { + return ValueType.LONG; + } + + @Override + public int getMaxIntermediateSize() + { + return Long.BYTES; + } + + @Override + public byte[] getCacheKey() + { + CacheKeyBuilder keyBuilder = new CacheKeyBuilder(AggregatorUtil.GROUPING_CACHE_TYPE_ID) + .appendStrings(groupings); + if (null != keyDimensions) { + keyBuilder.appendStrings(keyDimensions); + } + return keyBuilder.build(); + } + + /** + * Given the list of grouping dimensions, returns a long value where each bit at position X in the returned value + * corresponds to the dimension in groupings at same position X. X is the position relative to the right end. if + * keyDimensions contain the grouping dimension at position X, the bit is set to 0 at position X, otherwise it is + * set to 1. An example adapted from Microsoft SQL documentation Review comment: I think we can just drop the last statement here. It's opposite to what their documentation says anyway. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
