[GitHub] [druid] paul-rogers opened a new pull request, #12636: Today is that day - Single pass through Calcite planner

GitBox Sat, 11 Jun 2022 16:40:40 -0700


paul-rogers opened a new pull request, #12636:
URL: https://github.com/apache/druid/pull/12636


   The Druid planner contains the following comment:
   
   ```text
      * In some future this could perhaps re-use some of the work done by 
{@link #validate(boolean)}
      * instead of repeating it, but that day is not today.
   ```
   
   Well, today *is* that day. The Druid planner now makes only one pass through 
Calcite planner. This means we parse and validate the query only once, not 
twice as before.
   
   The original two-pass approach likely was a way to resolve a conflict. Druid 
wants access to the "validator" used to validate a query so we can extract the 
"resource actions" (the datasources which the query accesses.) The validator is 
private to the default `CalcitePlanner`, so Druid created its own. However, to 
then continue on to planning, the Calcite planner throws a fit because it 
thinks we skipped the validation step, because we didn't do validation though 
Calcites's own default planner. Hence, we had to start over and do it Calcite's 
way.
   
   Now, as it turns out, there is nothing special about the default 
`CalcitePlannerImpl`: it seems to exist as a handy way to support 
out-of-the-box, JDBC-based queries. More advanced use cases generally provide 
their own implementation. So, this PR clones the Calcite class, but with 
adjustments for Druid. The main adjustment is to provide access to the 
validator.
   
   Once we resolve the validation issue, the rest is just plumbing: saving the 
parser state between validate on the one hand, and prepare/plan on the other. 
`SqlLifecycle` and `DruidPlanner` change a bit to handle that task.
   
   The PR also moves some validation-like code from the `plan()` method to 
`validate()` to better fit into the new flow.
   
   The key risk with this kind of change is that we break something. To catch 
any regression, this work was done in a private branch that also had the 
[planner test framework](https://github.com/apache/druid/pull/12545). The 
planner artifacts (schema, logical plan, native query) were identical before 
and after the change. The various `Calcite?QueryTest` cases provide a lighter 
validation in this PR itself, since the planner framework is not yet in master, 
nor is it included in this PR.
   
   The result of this change is that:
   
   * The Druid planner produces less "garbage" due to Calcite's heavy object 
creation.
   * We do half the work within Calcite, thus saving some CPU resource.
   * Table (datasource) checks are done once, avoiding potential race 
conditions if they change between validation and plan.
   
   <hr>
   
   This PR has:
   - [X] been self-reviewed.
   - [X] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [X] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [X] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] paul-rogers opened a new pull request, #12636: Today is that day - Single pass through Calcite planner

Reply via email to