paul-rogers opened a new pull request, #12845:
URL: https://github.com/apache/druid/pull/12845

   With the recent changes to make a single pass through the planner, and to 
tidy up the JDBC semantics, we are now in a position to tighten up the 
semantics for SQL statement handling by replacing `SqlLifecycle` with a set of 
focused “statement” classes.
   
   ## Motivation
   
   In the recent discussion around the “single-pass planner”, @clint and @gianm 
provided a helpful description of some of the history as to how the code came 
to be as it is. One of the major drivers, we discovered, was the need to handle 
dynamic parameters for JDBC. We discovered that the JDBC implementation made 
some awkward assumptions, which were fixed in the recent “Avatica” PR.
   
   With the planner and JDBC tidied up, we are now in a position to examine 
`SqlLifecycle`, which appears to have been modeled on the original native 
`QueryLifecycle` which represents a "query". The existing design attempts to 
handle a variety of different query use cases. Moving forward, we need to model 
not just a query (that is, a `SELECT` statement), but also `INSERT`, DDL for 
the catalog, and so on. As a result modeling the SQL lifecycle on a query is 
becoming awkward: we want to model it on a SQL *statement*. While the full set 
of use cases were unclear during original development, by now, usage has 
settled down and we can enumerate them: HTTP requests, JDBC operations, and 
internal uses (primarily tests.)
   
   It has turned out to be awkward to try to model all these use cases as a 
query, and to have a single class that handles all use cases. In particular, it 
turns out we need to do things differently or JDBC than we do for HTTP queries. 
HTTP and JDBC are different enough that one size does not fit all.
   
   For HTTP, we do “direct” statement execution: we are presented with the 
query, context and parameters, and we make a straight shot through the planner 
to execution, stopping only if we encounter an error. This is the same pattern 
(but with somewhat different inputs) used by tests.
   
   On the other hand, for JDBC, we are obligated to implement a “prepare once, 
execute many” model. 
https://docs.oracle.com/javase/tutorial/jdbc/basics/prepared.html In our case, 
this means doing validation without parameter values (which are not yet 
provided by JDBC.) Then, to execute, we must replan with parameter values. 
Since we use a “query optimized” model, this means starting over as in the 
original code.
   
   Our challenge is to have `SqlLifecycle` implement two distinct processing 
flows, while strictly enforcing a set of states. This has caused the code to 
get rather complex.
   
   ## Restructured Classes
   
   The solution is to replace `SqlLifecycle` with classes focused on the two 
use cases identified above: a “direct” statement and a “prepared” statement:
   
   * `DirectStatement`: One which implements the single-pass model, using the 
auth result for authentication.
   * `HttpStatement`: a variation of `DirectStatement` which uses the servlet 
request for authentication.
   * `PreparedStatement`: implements the prepare-once, execute-many model.
   
   The resulting code is far simpler and easier to test. We now have a clean 
SQL stack for JDBC, HTTP and internal use that make a single planner pass when 
possible, and use two passes for the JDBC prepare/execute model.
   
   The JDBC classes are then implemented using the statement abstractions 
described above. A JDBC `ResultSet` is a `DirectStatement`. A JDBC 
`PreparedStatement` is a Druid `PreparedStatement`. The prior JDBC-related PR 
did the restructuring that allows us to now map JDBC directly to Druid 
statements.
   
   ## Secondary Changes
   
   Other supporting changes in this PR include:
   
   * The `DruidPlanner` no longer has the “reset” functionality: each planner 
instance is used by a single statement, only once.
   * The code to gather and report stats is factored out into a new 
`SqlExecutionReporter` class which is shared by the new statement classes.
   * The class to cancel queries, `SqlLifecycleManager` is refactored to use a 
`Cancellable` interface instead of depending on the details of `SqlLifecycle`.
   * `SqlToolbox` is added to hold the many knick-knacks passed from the 
`SqlLivecycleFactory` into statements.
   * The old mock-based `SqlLifecycleTest` is replace with a new 
`SqlStatementTest`.
   * `BaseCalciteQueryTest` is modified to use the `SqlQueryPlus` object 
introduced recently.
   * Many tests were updated to remove a now-unnecessary `throws` clause to 
keep IntelliJ inspections happy.
   * `PlannerConfig` us used with many variations in tests. Because there is no 
constructor, each variation created a subclass to change one value. A builder 
is added to make customizing the config easier in tests.
   
   <hr>
   
   This PR has:
   - [X] been self-reviewed.
   - [X] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [X] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [X] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to