clintropolis opened a new pull request #12241:
URL: https://github.com/apache/druid/pull/12241
### Description
This PR reworks how the SQL planner constructs `DruidExpression` and how it
translates them into `VirtualColumn` when constructing the native query.
Prior to the changes in this PR, `DruidExpression` were flattened string
expressions, eagerly constructed into a `VirtualColumn` via the
`VirtualColumnRegistry`, whenever required for use in projections,
aggregations, filters, etc. This meant that basically all of the expression
structure was lost after converting to the string expression, making it
impossible to modify them to try to optimize the query prior to conversion the
the native JSON query.
Now, `DruidExpression` retains the expression tree structure, each
containing a `List<DruidExpression>` that contains any sub-expressions which at
the time of virtual column creation is built into the string expression, using
a newly added interface
```
public interface ExpressionBuilder
{
String buildExpression(List<DruidExpression> arguments);
}
```
which is added to the `DruidExpression` when it is created. Another new
interface has been added to aid in rewriting `DruidExpression` trees before
planning into a native `Query`
```
public interface DruidExpressionShuttle
{
DruidExpression visit(DruidExpression expression);
default List<DruidExpression> visitAll(List<DruidExpression> expressions)
{
List<DruidExpression> list = new ArrayList<>(expressions.size());
for (DruidExpression expr : expressions) {
list.add(visit(expr));
}
return list;
}
}
```
(along with a `visit` method on `DruidExpression` to accept the shuttle).
`DruidExpression` are also now annotated, with one of 4 roles, `LITERAL`,
`IDENTIFIER`, `EXPRESSION`, and `SPECIALIZED`, the last of which, coupled with
the shuttle visitor, is the primary motivator of this PR.
Using `MV_FILTER_ONLY`/`MV_FILTER_NONE` as an example, which has a "native"
specialized `VirtualColumn` implementation, any expression which supplies such
a specialized implementation will now always use it.
Prior to this PR, when used in some composition, e.g.
```
SELECT MV_LENGTH(MV_FILTER_ONLY(dim3, ARRAY['b'])), SUM(cnt) FROM
druid.numfoo GROUP BY 1 ORDER BY 2 DESC
```
this function would use a fall-back expression
```
"virtualColumns" : [ {
"type" : "expression",
"name" : "v0",
"expression" : "array_length(filter((x) -> array_contains(array('b'),
x), "dim3"))",
"outputType" : "LONG"
} ],
```
but now plans to this instead
```
"virtualColumns" : [ {
"type" : "expression",
"name" : "v0",
"expression" : "array_length(\"v1\")",
"outputType" : "LONG"
}, {
"type" : "mv-filtered",
"name" : "v1",
"delegate" : {
"type" : "default",
"dimension" : "dim3",
"outputName" : "dim3",
"outputType" : "STRING"
},
"values" : [ "b" ],
"isAllowList" : true
} ],
```
<hr>
In the process, I uncovered additional bugs with multi-value string array
expressions, similar to the bugs fixed #12078 (and controlled by the flag
#12210)
<hr>
This PR has:
- [ ] been self-reviewed.
- [ ] using the [concurrency
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
(Remove this item if the PR doesn't have any relation to concurrency.)
- [ ] added documentation for new or modified features or behaviors.
- [ ] added Javadocs for most classes and all non-trivial methods. Linked
related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
- [ ] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [ ] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [ ] added integration tests.
- [ ] been tested in a test Druid cluster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]