gianm commented on code in PR #13179:
URL: https://github.com/apache/druid/pull/13179#discussion_r1134572518
##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java:
##########
@@ -1841,7 +1841,16 @@ private static Pair<List<DimensionSchema>,
List<AggregatorFactory>> makeDimensio
if (isRollupQuery) {
// Populate aggregators from the native query when doing an ingest in
rollup mode.
for (AggregatorFactory aggregatorFactory : ((GroupByQuery)
query).getAggregatorSpecs()) {
- String outputColumn =
Iterables.getOnlyElement(columnMappings.getOutputColumnsForQueryColumn(aggregatorFactory.getName()));
+ List<String> outputColumns =
columnMappings.getOutputColumnsForQueryColumn(aggregatorFactory.getName());
+ if (outputColumns == null || outputColumns.size() != 1) {
+ throw new ISE(
+ "Cannot use aggregator [%s] with input fields [%s] in the rollup
mode. It might be using a post aggregator. Please check the native plan to
figure out more information about the aggregator. You can disable the rollup
mode or use a different aggregator. Please refer to SQL-based ingestion docs
for more details.",
Review Comment:
This error should really be in the SQL layer, for a couple reasons:
1. It's a validation-style error that doesn't require actually executing the
query. So if we can detect this problem in the SQL layer, users will get their
error faster, and won't have tasks launched needlessly.
2. We can refer to SQL concepts and SQL names rather than native concepts
like "aggregator" and native names like `a0`.
There's also a couple of issues with actionability that should be fixed when
moving this error to the SQL layer:
- It suggests "disable the rollup mode" as a way to fix the problem, but
"rollup mode" is not a thing in MSQ. As we mention in
`multi-stage-query/concepts.md`, rollup is something that MSQ does
automatically when certain conditions are met. The user would need to switch on
aggregation finalization, which may not be what they want.
- It suggests "refer to SQL-based ingestion docs", but doesn't provide a
link or a hint on what to look for. Doc references should be URLs.
- It suggests "check the native plan", but doesn't say how to do that.
(Although I think when we move this to SQL layer, we won't need this part.)
There's three places we can check things in the SQL layer:
1. In the SQL validation phase (after parsing, prior to optimizing), i.e. in
`validate` in `DruidPlanner` or `IngestHandler`.
2. Immediately prior to translation from logical plan to Druid plan, i.e. in
`MSQTaskSqlEngine#buildQueryMakerForInsert`
3. Immediately prior to execution, i.e. in `MSQTaskQueryMaker` (the latest
possible time to validate something before a controller task is launched)
Ideally we validate as much as possible as early as possible. Also, ideally,
we validate it in a place where we can know the SQL name of the agg function
(e.g. from an instance of `SqlAggFunction`). That'll let us include the SQL
name in the error message.
I'm not totally sure which place is best for this. My guess is (1) or (2),
since by (3) we've gone pretty much fully native. Perhaps @paul-rogers would
have some advice as he has some experience with the validation stack.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]