Jackie-Jiang commented on a change in pull request #6503:
URL: https://github.com/apache/incubator-pinot/pull/6503#discussion_r566553949
##########
File path:
pinot-core/src/main/java/org/apache/pinot/core/util/TableConfigUtils.java
##########
@@ -278,11 +280,48 @@ public static void validateIngestionConfig(TableConfig
tableConfig, @Nullable Sc
"Arguments of a transform function '" + arguments + "' cannot
contain the destination column '"
+ columnName + "'");
}
+ columnToFunctionEvaluator.put(columnName, expressionEvaluator);
+ columnToTransformExpression.put(columnName, transformFunction);
+ }
+ Map<String, Integer> transformFunctionChainDepth = new HashMap<>();
+ for (String column : columnToFunctionEvaluator.keySet()) {
+ if (transformFunctionChainDepth(column, columnToFunctionEvaluator,
transformFunctionChainDepth) > 2) {
+ Set<String> derivedArguments =
columnToFunctionEvaluator.get(column).getArguments().stream()
+ .filter(a -> transformFunctionChainDepth.get(a) ==
2).collect(Collectors.toSet());
+ throw new IllegalStateException(String.format(
+ "Derived columns: [%s] cannot be used as arguments to the
transform function: %s of derived column: %s.",
+ derivedArguments, columnToTransformExpression.get(column),
column));
+ }
}
}
}
}
+ /**
+ * Returns the depth of chaining in transform functions. Eg:
Review comment:
This calculation should be based on whether the argument column exist in
the schema, instead of on depth only. A column is derived iff all the arguments
of its transform function exist in the schema.
E.g.
- If column A not in the schema (only exist in the source data), then `B =
f(A)` is not derived, but `C = f(B)` is derived because column B exists in the
schema.
- If column A exists in the schema, then `B = f(A)` is derived, and `C =
f(B)` is derived column on a derived column, which is invalid
With this logic, if A is in schema, then `B = f(A)` and `C = f(B)` is
invalid, and it can cause problem for the segment loader.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]