paul-rogers commented on PR #12636: URL: https://github.com/apache/druid/pull/12636#issuecomment-1162659924
@clintropolis, thanks for the history of parameters. I reread the links that you generously provided: very helpful. @gianm your advice to look into the JDBC (Avatica) code path was wise: doing so was insightful. It turns out that Druid's implementation of the JDBC protocol is a bit off, which may have led to some of the confusion around when to prepare and when to expect parameters. Issue #12682 spells out the gaps. This latest PR revises how we manage SQL processing in both HTTP and JDBC to try to get things a bit cleaner. A key challenge is that `SqlLifecycle` tries to implement two distinct processing flows, while strictly enforcing a set of states. This has caused the code to get rather complex. According to the comments, `SqlLifecycle` evolved from the native `QueryLifecycle` optimize for native queries. But SQL, for better or worse, works differently. SQL works around the idea of _statements_ and _cursors_. In JDBC, statements are of two kinds: `Statement` and `PreparedStatement`. The cursor is represented by a `ResultSet`. Each statement can have one or more result sets open, and each statement can give rise to any number of result sets (one after another.) In a prepared statement, each execution tends to take a set of parameter values. Then, on the HTTP side, we have a "direct execution" model: we get a SQL statement, context variables and parameters, and run through the planner sequence: parse, validate, authorize, plan and execute. To model this behavior, the latest commit refactors `SqlLifecycle` in to focused "statement" classes: * `DirectStatement`: Implements a one-pass, directly executed statement. Accepts parameters. Makes a single pass through the planner. Produces a `Sequence` (Druid's result set) as output. * `HttpStatement`: A direct statement for use in the HTTP endpoint. * `PreparedStatement`: The Druid implementation of the JDBC prepared statement. Produces a `PrepareResult` as output. Allows execution by creating a `DirectStatement` which runs the query, with parameters. Supports the JDBC prepared statement model. The resulting code is far simpler and easier to test. Check out the result to see if the refactoring makes sense, and resolves the issues we've been discussing. Then, the Avatica JDBC implementation is refactored to clearly model the JDBC concepts: * `DruidJdbcStatement` models a (non-prepared) `Statement`. * `DruidJdbcPreparedStatement` models a `PreparedStatement`. * `AbstractDruidJdbcStatement` is the common base class for the above. * `DruidJdbcResultSet` represents a result set (a running query). The JDBC classes are then implemented using the statement abstractions described above. A JDBC `ResultSet` is a `DirectStatement`. A JDBC `PreparedStatement` is a Druid `PreparedStatement`. The Avatica driver tests were modified to be consistent with the JDBC standard. The original code didn't pass, but the revised code does. The result is that we now have a clean SQL stack for JDBC, HTTP and internal use that make a single planner pass when possible, and use two passes for the JDBC prepare/execute model. Please take a look and see what you think. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
