paul-rogers opened a new pull request, #12709: URL: https://github.com/apache/druid/pull/12709
PR #12636 amends Druid to make a single pass through the Calcite planner. This was originally done in support of the catalog project to avoid resolving the same catalog object multiple times, which could encounter race conditions if the resolutions occurred on either side of a change to the catalog. While discussing that change, @clintropolis called our attention to the "query optimized" way that Druid handles parameters, which then raised questions about how we ensure the proposed changes do not break our JDBC support. @gianm suggested we focus testing on JDBC. That testing revealed that Druid's implementation of the JDBC protocol is a bit off, which may have led to some of the confusion around when to prepare and when to expect parameters. [Issue #12682](https://github.com/apache/druid/issues/12682) and [issue #12684](https://github.com/apache/druid/issues/12684) spell out the gaps. While JDBC has two kinds of statements (`Statement` and `PreparedStatement`), Druid has just one that attempts to do what the two JDBC statements do. That is, it tries to be both prepared and non-prepared. Also, while JDBC separates the idea of statement and `ResultSet`, Druid combines the two: closing the result set closes the statement as well. While this might be handy, for a `Statement`, it does work at cross-purposes to the `PreparedStatement` which is designed to be prepared once and executed multiple times. Closing the statement after the first execution forces `PreparedStatement` to be identical to `Statement` (except that the former takes parameters.) ### Corrects the Druid JDBC Driver Behavior We could argue about the merits of the Druid design. However, JDBC is a standard and the correct behavior is spelled out in the [JDBC 4.1 Spec](https://download.oracle.com/otn-pub/jcp/jdbc-4_1-mrel-spec/jdbc4.1-fr-spec.pdf?AuthParam=1655858260_d6388aeec9aeb8e5616d221749b1ff71). That behavior is: * A `Statement` is a reusable object that can execute multiple SQL statements, using a `ResultSet`. Closing the `ResultSet` does not close the statement. Druid can, however, close the `ResultSet` on EOF. * A `PreparedStatement` is a reusable object created with a (typically parameterized) SQL statement. Each execution is given a set of parameter values, and produces a `ResultSet`. Again, closing the `ResultSet` does not close the statement, which allows the statement to be executed with a different set of parameters. To correct the Druid behavior, the Avatica JDBC implementation is refactored to clearly model the JDBC concepts: * `DruidJdbcStatement` models a (non-prepared) Statement. * `DruidJdbcPreparedStatement` models a PreparedStatement. * `AbstractDruidJdbcStatement` is the common base class for the above. * `DruidJdbcResultSet` represents a result set (a running query). While PR #12636 went on to modify the planner and the `SqlLifecyle` layers, this PR leaves those layers unchanged, and implements the desired behavior with the existing code. The hope is that the result is a bit easier on reviewers. This code will be modified again later to support the single-pass planner. The Avatica driver tests were modified to be consistent with the JDBC standard. The original code didn't pass, but the revised code does. The result is that we now have a clean SQL stack for JDBC, HTTP and internal use that make a single planner pass when possible, and use two passes for the JDBC prepare/execute model. ### Flaky Test: `DruidAvaticaHandlerTest.testConcurrentQueries()` Also fixes a flaky test: `DruidAvaticaHandlerTest.testConcurrentQueries()`. This test has occasionally failed in builds. The problem was first mentioned in 2017 in [Issue #4408](https://github.com/apache/druid/issues/4408) , which was claimed to be fixed. When run in a tight loop, however, the test eventually fails with an NPE. The issue is traced to the context on the `DruidConnection`: it is passed to each statement, where it is shared and concurrently modified. The fix is to make a copy for each new statement which avoids queries trying to share the same map. Verified the problem by running the test 100 times in a tight loop. Verified the fix the same way. ### Minor Revisions Careful inspection of the code identified several other improvements: * Hold the yielder thread pool on the connection, not statement, so we get some benefit from caching. (It might be even better to define the pool globally.) * Define a `SqlRequest` to hold the quad of (SQL, context, parameters, auth result), which makes function signatures simpler. * Shifted some functionality from `DruidMeta` to `DruidConnection`. `DruidMeta` resolves connections, then connections work with statements. <hr> This PR has: - [X] been self-reviewed. - [X] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [X] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [X] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met. - [ ] been tested in a test Druid cluster. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
