cryptoe commented on code in PR #14025:
URL: https://github.com/apache/druid/pull/14025#discussion_r1161739137
##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java:
##########
@@ -1841,13 +1856,15 @@ private static Pair<List<DimensionSchema>,
List<AggregatorFactory>> makeDimensio
if (isRollupQuery) {
// Populate aggregators from the native query when doing an ingest in
rollup mode.
for (AggregatorFactory aggregatorFactory : ((GroupByQuery)
query).getAggregatorSpecs()) {
- String outputColumn =
Iterables.getOnlyElement(columnMappings.getOutputColumnsForQueryColumn(aggregatorFactory.getName()));
- if (outputColumnAggregatorFactories.containsKey(outputColumn)) {
- throw new ISE("There can only be one aggregator factory for column
[%s].", outputColumn);
+ final int outputColumn =
+
Iterables.getOnlyElement(columnMappings.getOutputColumnsForQueryColumn(aggregatorFactory.getName()));
Review Comment:
nit: `Iterables.getOnlyElement` fails with a non intuitive user error. Cam
we remove that call here and do a size check.
If the check fails throw an appropriate user error.
##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java:
##########
@@ -1856,35 +1873,38 @@ private static Pair<List<DimensionSchema>,
List<AggregatorFactory>> makeDimensio
// Each column can be of either time, dimension, aggregator. For this
method. we can ignore the time column.
// For non-complex columns, If the aggregator factory of the column is not
available, we treat the column as
// a dimension. For complex columns, certains hacks are in place.
- for (final String outputColumn : outputColumnsInOrder) {
- final String queryColumn =
columnMappings.getQueryColumnForOutputColumn(outputColumn);
+ for (final String outputColumnName : outputColumnsInOrder) {
+ // Iterables.getOnlyElement because this method is only called during
ingestion, where we require
+ // that output names be unique.
+ final int outputColumn =
Iterables.getOnlyElement(columnMappings.getOutputColumnsByName(outputColumnName));
Review Comment:
Same comment here
##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/indexing/ColumnMappings.java:
##########
@@ -66,34 +74,80 @@ public static ColumnMappings identity(final RowSignature
signature)
);
}
+ /**
+ * Number of output columns.
+ */
+ public int size()
+ {
+ return mappings.size();
+ }
+
+ /**
+ * All output column names, in order. Some names may appear more than once,
unless
+ * {@link #hasUniqueOutputColumnNames()} is true.
+ */
public List<String> getOutputColumnNames()
{
return
mappings.stream().map(ColumnMapping::getOutputColumn).collect(Collectors.toList());
}
+ /**
+ * Whether output column names from {@link #getOutputColumnNames()} are all
unique.
+ */
+ public boolean hasUniqueOutputColumnNames()
+ {
+ final Set<String> encountered = new HashSet<>();
+
+ for (final ColumnMapping mapping : mappings) {
+ if (!encountered.add(mapping.getOutputColumn())) {
+ return false;
+ }
+ }
+
+ return true;
+ }
+
+ /**
+ * Whether a particular output column name exists.
+ */
public boolean hasOutputColumn(final String columnName)
{
- return outputToQueryColumnMap.containsKey(columnName);
+ return outputNameToPositionMap.containsKey(columnName);
}
- public String getQueryColumnForOutputColumn(final String outputColumn)
+ /**
+ * Query column name for a particular output column position.
+ */
+ public String getQueryColumnName(final int outputColumn)
{
- final String queryColumn = outputToQueryColumnMap.get(outputColumn);
- if (queryColumn != null) {
- return queryColumn;
- } else {
- throw new IAE("No such output column [%s]", outputColumn);
- }
+ return mappings.get(outputColumn).getQueryColumn();
}
- public List<String> getOutputColumnsForQueryColumn(final String queryColumn)
+ /**
+ * Output column name for a particular output column position.
+ */
+ public String getOutputColumnName(final int outputColumn)
{
- final List<String> outputColumns =
queryToOutputColumnsMap.get(queryColumn);
- if (outputColumns != null) {
- return outputColumns;
- } else {
- return Collections.emptyList();
+ return mappings.get(outputColumn).getOutputColumn();
Review Comment:
Same defensive coding comment here.
##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/indexing/ColumnMappings.java:
##########
@@ -66,34 +74,80 @@ public static ColumnMappings identity(final RowSignature
signature)
);
}
+ /**
+ * Number of output columns.
+ */
+ public int size()
+ {
+ return mappings.size();
+ }
+
+ /**
+ * All output column names, in order. Some names may appear more than once,
unless
+ * {@link #hasUniqueOutputColumnNames()} is true.
+ */
public List<String> getOutputColumnNames()
{
return
mappings.stream().map(ColumnMapping::getOutputColumn).collect(Collectors.toList());
}
+ /**
+ * Whether output column names from {@link #getOutputColumnNames()} are all
unique.
+ */
+ public boolean hasUniqueOutputColumnNames()
+ {
+ final Set<String> encountered = new HashSet<>();
+
+ for (final ColumnMapping mapping : mappings) {
+ if (!encountered.add(mapping.getOutputColumn())) {
+ return false;
+ }
+ }
+
+ return true;
+ }
+
+ /**
+ * Whether a particular output column name exists.
+ */
public boolean hasOutputColumn(final String columnName)
{
- return outputToQueryColumnMap.containsKey(columnName);
+ return outputNameToPositionMap.containsKey(columnName);
}
- public String getQueryColumnForOutputColumn(final String outputColumn)
+ /**
+ * Query column name for a particular output column position.
+ */
+ public String getQueryColumnName(final int outputColumn)
{
- final String queryColumn = outputToQueryColumnMap.get(outputColumn);
- if (queryColumn != null) {
- return queryColumn;
- } else {
- throw new IAE("No such output column [%s]", outputColumn);
- }
+ return mappings.get(outputColumn).getQueryColumn();
Review Comment:
Shouldn't we have defensive coding here since this is a public method.
##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/indexing/report/MSQResultsReport.java:
##########
@@ -63,7 +68,7 @@ static MSQResultsReport fromJson(
}
@JsonProperty("signature")
- public RowSignature getSignature()
+ public List<ColumnAndType> getSignature()
Review Comment:
Nice. I guess we would not require any json changes with this.
+1
##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/sql/MSQTaskSqlEngine.java:
##########
@@ -257,7 +256,7 @@ private static void validateNoDuplicateAliases(final
List<Pair<Integer, String>>
for (final Pair<Integer, String> field : fieldMappings) {
if (!aliasesSeen.add(field.right)) {
- throw new ValidationException("Duplicate field in SELECT: " +
field.right);
+ throw new ValidationException("Duplicate field in SELECT: [" +
field.right + "]");
Review Comment:
```suggestion
throw new ValidationException("Duplicate field in INSERT/REPLACE: ["
+ field.right + "]");
```
##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/indexing/ColumnMappings.java:
##########
@@ -36,23 +35,32 @@
import java.util.Set;
import java.util.stream.Collectors;
+/**
+ * Maps column names from {@link MSQSpec#getQuery()} to output names desired
by the user, in the order
+ * desired by the user.
+ *
+ * The {@link MSQSpec#getQuery()} is translated by {@link
org.apache.druid.msq.querykit.QueryKit} into
+ * a {@link org.apache.druid.msq.kernel.QueryDefinition}. So, this class also
represents mappings from
+ * {@link
org.apache.druid.msq.kernel.QueryDefinition#getFinalStageDefinition()} into the
output names desired
+ * by the user.
+ */
public class ColumnMappings
{
private final List<ColumnMapping> mappings;
- private final Map<String, String> outputToQueryColumnMap;
- private final Map<String, List<String>> queryToOutputColumnsMap;
+ private final Map<String, IntList> outputNameToPositionMap;
+ private final Map<String, IntList> queryNameToPositionMap;
Review Comment:
```suggestion
private final Map<String, IntList> queryColNameToPositionMap;
```
Nit: rename
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]