LakshSingla commented on code in PR #16068: URL: https://github.com/apache/druid/pull/16068#discussion_r1533320463
########## processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/column/KeyMappingGroupByColumnSelectorStrategy.java: ########## @@ -0,0 +1,243 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.query.groupby.epinephelinae.column; + +import com.google.common.base.Preconditions; +import org.apache.druid.query.DimensionComparisonUtils; +import org.apache.druid.query.groupby.ResultRow; +import org.apache.druid.query.groupby.epinephelinae.Grouper; +import org.apache.druid.query.ordering.StringComparator; +import org.apache.druid.segment.ColumnValueSelector; +import org.apache.druid.segment.DimensionHandlerUtils; +import org.apache.druid.segment.column.ColumnType; +import org.apache.druid.segment.column.NullableTypeStrategy; + +import javax.annotation.Nullable; +import javax.annotation.concurrent.NotThreadSafe; +import java.nio.ByteBuffer; + +/** + * Strategy for grouping dimensions which can have variable-width objects. Materializing such objects on the buffer + * require an additional step of mapping them to an integer index. The integer index can be materialized on the buffer within + * a fixed width, and is often backed by a dictionary representing the actual dimension object. It is used for arrays, + * strings, and complex types. + * <p> + * The visibility of the class is limited, and the callers must use one of the two variants of the mapping strategy: + * 1. {@link PrebuiltDictionaryStringGroupByColumnSelectorStrategy} + * 2. {@link DictionaryBuildingGroupByColumnSelectorStrategy} + * <p> + * TODO(laksh): Vet this change + * {@code null} can be represented by either -1 or the position of null in the dictionary it was stored when it was + * encountered. This is fine, because most of the time, the dictionary id has no value of its own, and is converted back to + * the value it represents, before doing further operations. The only place where it would matter would be when + * {@link IdToDimensionConverter#canCompareIds()} is true, and we compare directly on the dictionary ids for prebuilt + * dictionaries (we can't compare ids for the dictionaries built on the fly in the grouping strategy). However, in that case, + * it is guaranteed that the dictionaryId of null represented by the pre-built dictionary would be the lowest (most likely 0) + * and therefore nulls (-1) would be adjacent to nulls (represented by the lowest non-negative dictionary id), and would get + * grouped in the later merge stages. + * + * @param <DimensionType>> Class of the dimension + * @param <DimensionHolderType> Class of the "dimension holder". For single-value dimensions, the holder's type and the + * holder's object are equivalent to the dimension. For multi-value dimensions (only strings), + * the holder's type and the object are different, where the type would be {@link org.apache.druid.segment.data.IndexedInts} + * representing all the values in the multi-valued string, while the dimension type would be + * String + * @see DimensionToIdConverter encoding logic for converting value to dictionary + * @see IdToDimensionConverter decoding logic for converting back dictionary to value + */ +@NotThreadSafe +class KeyMappingGroupByColumnSelectorStrategy<DimensionType, DimensionHolderType> Review Comment: This makes more sense. It'll introduce some redundancy, but it should make the code a lot cleaner (especially when we handle single values) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
