paul-rogers opened a new pull request, #12636:
URL: https://github.com/apache/druid/pull/12636
The Druid planner contains the following comment:
```text
* In some future this could perhaps re-use some of the work done by
{@link #validate(boolean)}
* instead of repeating it, but that day is not today.
```
Well, today *is* that day. The Druid planner now makes only one pass through
Calcite planner. This means we parse and validate the query only once, not
twice as before.
The original two-pass approach likely was a way to resolve a conflict. Druid
wants access to the "validator" used to validate a query so we can extract the
"resource actions" (the datasources which the query accesses.) The validator is
private to the default `CalcitePlanner`, so Druid created its own. However, to
then continue on to planning, the Calcite planner throws a fit because it
thinks we skipped the validation step, because we didn't do validation though
Calcites's own default planner. Hence, we had to start over and do it Calcite's
way.
Now, as it turns out, there is nothing special about the default
`CalcitePlannerImpl`: it seems to exist as a handy way to support
out-of-the-box, JDBC-based queries. More advanced use cases generally provide
their own implementation. So, this PR clones the Calcite class, but with
adjustments for Druid. The main adjustment is to provide access to the
validator.
Once we resolve the validation issue, the rest is just plumbing: saving the
parser state between validate on the one hand, and prepare/plan on the other.
`SqlLifecycle` and `DruidPlanner` change a bit to handle that task.
The PR also moves some validation-like code from the `plan()` method to
`validate()` to better fit into the new flow.
The key risk with this kind of change is that we break something. To catch
any regression, this work was done in a private branch that also had the
[planner test framework](https://github.com/apache/druid/pull/12545). The
planner artifacts (schema, logical plan, native query) were identical before
and after the change. The various `Calcite?QueryTest` cases provide a lighter
validation in this PR itself, since the planner framework is not yet in master,
nor is it included in this PR.
The result of this change is that:
* The Druid planner produces less "garbage" due to Calcite's heavy object
creation.
* We do half the work within Calcite, thus saving some CPU resource.
* Table (datasource) checks are done once, avoiding potential race
conditions if they change between validation and plan.
<hr>
This PR has:
- [X] been self-reviewed.
- [X] added Javadocs for most classes and all non-trivial methods. Linked
related entities via Javadoc links.
- [X] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [X] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [ ] been tested in a test Druid cluster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]