[GitHub] [druid] gianm commented on a diff in pull request #14025: MSQ: Support multiple result columns with the same name.

via GitHub Wed, 12 Apr 2023 09:39:40 -0700


gianm commented on code in PR #14025:
URL: https://github.com/apache/druid/pull/14025#discussion_r1164385083



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java:
##########
@@ -1856,35 +1873,38 @@ private static Pair<List<DimensionSchema>, 
List<AggregatorFactory>> makeDimensio
     // Each column can be of either time, dimension, aggregator. For this 
method. we can ignore the time column.
     // For non-complex columns, If the aggregator factory of the column is not 
available, we treat the column as
     // a dimension. For complex columns, certains hacks are in place.
-    for (final String outputColumn : outputColumnsInOrder) {
-      final String queryColumn = 
columnMappings.getQueryColumnForOutputColumn(outputColumn);
+    for (final String outputColumnName : outputColumnsInOrder) {
+      // Iterables.getOnlyElement because this method is only called during 
ingestion, where we require
+      // that output names be unique.
+      final int outputColumn = 
Iterables.getOnlyElement(columnMappings.getOutputColumnsByName(outputColumnName));

Review Comment:
   Same comment from me as well 🙂



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/indexing/ColumnMappings.java:
##########
@@ -36,23 +35,32 @@
 import java.util.Set;
 import java.util.stream.Collectors;
 
+/**
+ * Maps column names from {@link MSQSpec#getQuery()} to output names desired 
by the user, in the order
+ * desired by the user.
+ *
+ * The {@link MSQSpec#getQuery()} is translated by {@link 
org.apache.druid.msq.querykit.QueryKit} into
+ * a {@link org.apache.druid.msq.kernel.QueryDefinition}. So, this class also 
represents mappings from
+ * {@link 
org.apache.druid.msq.kernel.QueryDefinition#getFinalStageDefinition()} into the 
output names desired
+ * by the user.
+ */
 public class ColumnMappings
 {
   private final List<ColumnMapping> mappings;
-  private final Map<String, String> outputToQueryColumnMap;
-  private final Map<String, List<String>> queryToOutputColumnsMap;
+  private final Map<String, IntList> outputNameToPositionMap;
+  private final Map<String, IntList> queryNameToPositionMap;

Review Comment:
   I don't really like abbreviations in names, so I went with the even longer 
`queryColumnnameToPositionMap`.



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java:
##########
@@ -1841,13 +1856,15 @@ private static Pair<List<DimensionSchema>, 
List<AggregatorFactory>> makeDimensio
     if (isRollupQuery) {
       // Populate aggregators from the native query when doing an ingest in 
rollup mode.
       for (AggregatorFactory aggregatorFactory : ((GroupByQuery) 
query).getAggregatorSpecs()) {
-        String outputColumn = 
Iterables.getOnlyElement(columnMappings.getOutputColumnsForQueryColumn(aggregatorFactory.getName()));
-        if (outputColumnAggregatorFactories.containsKey(outputColumn)) {
-          throw new ISE("There can only be one aggregator factory for column 
[%s].", outputColumn);
+        final int outputColumn =
+            
Iterables.getOnlyElement(columnMappings.getOutputColumnsForQueryColumn(aggregatorFactory.getName()));

Review Comment:
   I tried doing this, but I felt it complexified the code too much, especially 
since this case "can't happen" (unless there is some kind of coding bug). I 
decided to do it by adding a method `CollectionUtils.getOnlyElement` that makes 
this pattern (`getOnlyElement` + custom error message`) a little easier, 
requiring less code at the call site.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] gianm commented on a diff in pull request #14025: MSQ: Support multiple result columns with the same name.

Reply via email to