clintropolis commented on code in PR #16068:
URL: https://github.com/apache/druid/pull/16068#discussion_r1538804829


##########
processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/column/DimensionToIdConverter.java:
##########
@@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.groupby.epinephelinae.column;
+
+import org.apache.druid.segment.ColumnValueSelector;
+
+import javax.annotation.Nullable;
+
+/**
+ * Interface for converters of dimension to dictionary id.
+ *
+ * This is a slightly convoluted interface because it also encapsulates the 
additional logic for handling multi-value
+ * dimensions. It has an additional step that converts the given dimensions to 
"dimension holders", which represent the
+ * multi-value holders for a given dimension.
+ * Therefore, the conversion goes from ColumnValueSelector -> DimensionHolder 
-> DictionaryID (for each dimension in the holder)
+ *
+ * The dimension holder is only applicable for multi-value strings.
+ * For other dimensions that cannot have multi-values the dimension holder is 
identical to the dimension. They can be
+ * defensively cast or homogenised, for example doubles to floats for float 
selectors or Long[] to Object[] for array
+ * selectors, so that the upstream callers can assume the class of the 
dimensions. The size of these dimensions is always 1,
+ * and only contain a value at index 0.
+ *
+ * Converting a value to its dictionary id might require building dictionaries 
on the fly while computing the id. The
+ * return type of the methods, except {@link #multiValueSize}, takes that into 
account.
+ *
+ * The implementations can pre-convert the value to the dictionaryId while 
extracting the dimensionHolder. Extracting
+ * dictionary id for a specific value from the (potentially multi-value 
dimension holder) can be done by calling
+ * {@link #getIndividualValueDictId} and passing the index to the multi-value.
+ *
+ * @see IdToDimensionConverter for converting the dictionary values back to 
dimensions
+ *
+ * @param <DimensionHolderType> Type of the dimension holder
+ */
+public interface DimensionToIdConverter<DimensionHolderType>

Review Comment:
   well, rather than explicitly writing that function, the idea would be to 
automatically wrap the `storageAdapter` in an `UnnestStorageAdapter` for each 
multi-value string being grouped. The cursors made by `UnnestStorageAdapter` 
already use dimension selectors in the case of a multi-value string, and so can 
produce dictionary encoded dimension selectors from the cursor as well, so we 
should be able to retain grouping directly on the dimension id. If i recall 
correctly, because we use the dimension selector for the unnest cursor, I 
_think_ it should behave the same way as the grouping engine with regards to 
`null` and `[]`, but I would have to confirm.
   
   This is probably too big of a change for this PR though, so I don't think 
this refactor needs to be done right now, but is worth pursuing in the future I 
think because it feels like the right way to handle things, then grouping 
itself has no concept of multi-value dimensions which sounds pretty amazing to 
me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to