clintropolis opened a new pull request #12241:
URL: https://github.com/apache/druid/pull/12241


   ### Description
   
   This PR reworks how the SQL planner constructs `DruidExpression` and how it 
translates them into `VirtualColumn` when constructing the native query.
   
   Prior to the changes in this PR, `DruidExpression` were flattened string 
expressions, eagerly constructed into a `VirtualColumn` via the 
`VirtualColumnRegistry`, whenever required for use in projections, 
aggregations, filters, etc. This meant that basically all of the expression 
structure was lost after converting to the string expression, making it 
impossible to modify them to try to optimize the query prior to conversion the 
the native JSON query.
   
   Now, `DruidExpression` retains the expression tree structure, each 
containing a `List<DruidExpression>` that contains any sub-expressions which at 
the time of virtual column creation is built into the string expression, using 
a newly added interface
   ```
     public interface ExpressionBuilder
     {
       String buildExpression(List<DruidExpression> arguments);
     }
   ```
   which is added to the `DruidExpression` when it is created. Another new 
interface has been added to aid in rewriting `DruidExpression` trees before 
planning into a native `Query`
   
   ```
     public interface DruidExpressionShuttle
     {
       DruidExpression visit(DruidExpression expression);
   
       default List<DruidExpression> visitAll(List<DruidExpression> expressions)
       {
         List<DruidExpression> list = new ArrayList<>(expressions.size());
         for (DruidExpression expr : expressions) {
           list.add(visit(expr));
         }
         return list;
       }
     }
   ```
   
   (along with a `visit` method on `DruidExpression` to accept the shuttle).
   
   `DruidExpression` are also now annotated, with one of 4 roles, `LITERAL`, 
`IDENTIFIER`, `EXPRESSION`, and `SPECIALIZED`, the last of which, coupled with 
the shuttle visitor, is the primary motivator of this PR.
   
   Using `MV_FILTER_ONLY`/`MV_FILTER_NONE` as an example, which has a "native" 
specialized `VirtualColumn` implementation, any expression which supplies such 
a specialized implementation will now always use it.
   
   Prior to this PR, when used in some composition, e.g.
   
   ```
   SELECT MV_LENGTH(MV_FILTER_ONLY(dim3, ARRAY['b'])), SUM(cnt) FROM 
druid.numfoo GROUP BY 1 ORDER BY 2 DESC
   ```
   
   this function would use a fall-back expression
   ```
   "virtualColumns" : [ {
       "type" : "expression",
       "name" : "v0",
       "expression" : "array_length(filter((x) -> array_contains(array('b'), 
x), "dim3"))",
       "outputType" : "LONG"
     } ],
   ```
   
   but now plans to this instead
   ```
   "virtualColumns" : [ {
       "type" : "expression",
       "name" : "v0",
       "expression" : "array_length(\"v1\")",
       "outputType" : "LONG"
     }, {
       "type" : "mv-filtered",
       "name" : "v1",
       "delegate" : {
         "type" : "default",
         "dimension" : "dim3",
         "outputName" : "dim3",
         "outputType" : "STRING"
       },
       "values" : [ "b" ],
       "isAllowList" : true
     } ],
   ```
   
   <hr>
   
   In the process, I uncovered additional bugs with multi-value string array 
expressions, similar to the bugs fixed #12078 (and controlled by the flag 
#12210)
   
   <hr>
   
   This PR has:
   - [ ] been self-reviewed.
      - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to