paul-rogers opened a new pull request, #12845: URL: https://github.com/apache/druid/pull/12845
With the recent changes to make a single pass through the planner, and to tidy up the JDBC semantics, we are now in a position to tighten up the semantics for SQL statement handling by replacing `SqlLifecycle` with a set of focused “statement” classes. ## Motivation In the recent discussion around the “single-pass planner”, @clint and @gianm provided a helpful description of some of the history as to how the code came to be as it is. One of the major drivers, we discovered, was the need to handle dynamic parameters for JDBC. We discovered that the JDBC implementation made some awkward assumptions, which were fixed in the recent “Avatica” PR. With the planner and JDBC tidied up, we are now in a position to examine `SqlLifecycle`, which appears to have been modeled on the original native `QueryLifecycle` which represents a "query". The existing design attempts to handle a variety of different query use cases. Moving forward, we need to model not just a query (that is, a `SELECT` statement), but also `INSERT`, DDL for the catalog, and so on. As a result modeling the SQL lifecycle on a query is becoming awkward: we want to model it on a SQL *statement*. While the full set of use cases were unclear during original development, by now, usage has settled down and we can enumerate them: HTTP requests, JDBC operations, and internal uses (primarily tests.) It has turned out to be awkward to try to model all these use cases as a query, and to have a single class that handles all use cases. In particular, it turns out we need to do things differently or JDBC than we do for HTTP queries. HTTP and JDBC are different enough that one size does not fit all. For HTTP, we do “direct” statement execution: we are presented with the query, context and parameters, and we make a straight shot through the planner to execution, stopping only if we encounter an error. This is the same pattern (but with somewhat different inputs) used by tests. On the other hand, for JDBC, we are obligated to implement a “prepare once, execute many” model. https://docs.oracle.com/javase/tutorial/jdbc/basics/prepared.html In our case, this means doing validation without parameter values (which are not yet provided by JDBC.) Then, to execute, we must replan with parameter values. Since we use a “query optimized” model, this means starting over as in the original code. Our challenge is to have `SqlLifecycle` implement two distinct processing flows, while strictly enforcing a set of states. This has caused the code to get rather complex. ## Restructured Classes The solution is to replace `SqlLifecycle` with classes focused on the two use cases identified above: a “direct” statement and a “prepared” statement: * `DirectStatement`: One which implements the single-pass model, using the auth result for authentication. * `HttpStatement`: a variation of `DirectStatement` which uses the servlet request for authentication. * `PreparedStatement`: implements the prepare-once, execute-many model. The resulting code is far simpler and easier to test. We now have a clean SQL stack for JDBC, HTTP and internal use that make a single planner pass when possible, and use two passes for the JDBC prepare/execute model. The JDBC classes are then implemented using the statement abstractions described above. A JDBC `ResultSet` is a `DirectStatement`. A JDBC `PreparedStatement` is a Druid `PreparedStatement`. The prior JDBC-related PR did the restructuring that allows us to now map JDBC directly to Druid statements. ## Secondary Changes Other supporting changes in this PR include: * The `DruidPlanner` no longer has the “reset” functionality: each planner instance is used by a single statement, only once. * The code to gather and report stats is factored out into a new `SqlExecutionReporter` class which is shared by the new statement classes. * The class to cancel queries, `SqlLifecycleManager` is refactored to use a `Cancellable` interface instead of depending on the details of `SqlLifecycle`. * `SqlToolbox` is added to hold the many knick-knacks passed from the `SqlLivecycleFactory` into statements. * The old mock-based `SqlLifecycleTest` is replace with a new `SqlStatementTest`. * `BaseCalciteQueryTest` is modified to use the `SqlQueryPlus` object introduced recently. * Many tests were updated to remove a now-unnecessary `throws` clause to keep IntelliJ inspections happy. * `PlannerConfig` us used with many variations in tests. Because there is no constructor, each variation created a subclass to change one value. A builder is added to make customizing the config easier in tests. <hr> This PR has: - [X] been self-reviewed. - [X] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [X] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [X] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met. - [ ] been tested in a test Druid cluster. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
