paul-rogers opened a new pull request, #12709:
URL: https://github.com/apache/druid/pull/12709

   PR #12636 amends Druid to make a single pass through the Calcite planner. 
This was originally done in support of the catalog project to avoid resolving 
the same catalog object multiple times, which could encounter race conditions 
if the resolutions occurred on either side of a change to the catalog.
   
   While discussing that change, @clintropolis called our attention to the 
"query optimized" way that Druid handles parameters, which then raised 
questions about how we ensure the proposed changes do not break our JDBC 
support. @gianm suggested we focus testing on JDBC.
   
   That testing revealed that Druid's implementation of the JDBC protocol is a 
bit off, which may have led to some of the confusion around when to prepare and 
when to expect parameters. [Issue 
#12682](https://github.com/apache/druid/issues/12682) and [issue 
#12684](https://github.com/apache/druid/issues/12684) spell out the gaps.
   
   While JDBC has two kinds of statements (`Statement` and 
`PreparedStatement`), Druid has just one that attempts to do what the two JDBC 
statements do. That is, it tries to be both prepared and non-prepared.
   
   Also, while JDBC separates the idea of statement and `ResultSet`, Druid 
combines the two: closing the result set closes the statement as well. While 
this might be handy, for a `Statement`, it does work at cross-purposes to the 
`PreparedStatement` which is designed to be prepared once and executed multiple 
times. Closing the statement after the first execution forces 
`PreparedStatement` to be identical to `Statement` (except that the former 
takes parameters.)
   
   ### Corrects the Druid JDBC Driver Behavior
   
   We could argue about the merits of the Druid design. However, JDBC is a 
standard and the correct behavior is spelled out in the [JDBC 4.1 
Spec](https://download.oracle.com/otn-pub/jcp/jdbc-4_1-mrel-spec/jdbc4.1-fr-spec.pdf?AuthParam=1655858260_d6388aeec9aeb8e5616d221749b1ff71).
 That behavior is:
   
   * A `Statement` is a reusable object that can execute multiple SQL 
statements, using a `ResultSet`. Closing the `ResultSet` does not close the 
statement. Druid can, however, close the `ResultSet` on EOF.
   * A `PreparedStatement` is a reusable object created with a (typically 
parameterized) SQL statement. Each execution is given a set of parameter 
values, and produces a `ResultSet`. Again, closing the `ResultSet` does not 
close the statement, which allows the statement to be executed with a different 
set of parameters.
   
   To correct the Druid behavior, the Avatica JDBC implementation is refactored 
to clearly model the JDBC concepts:
   
   * `DruidJdbcStatement` models a (non-prepared) Statement.
   * `DruidJdbcPreparedStatement` models a PreparedStatement.
   * `AbstractDruidJdbcStatement` is the common base class for the above.
   * `DruidJdbcResultSet` represents a result set (a running query).
   
   While PR #12636 went on to modify the planner and the `SqlLifecyle` layers, 
this PR leaves those layers unchanged, and implements the desired behavior with 
the existing code. The hope is that the result is a bit easier on reviewers. 
This code will be modified again later to support the single-pass planner.
   
   The Avatica driver tests were modified to be consistent with the JDBC 
standard. The original code didn't pass, but the revised code does.
   
   The result is that we now have a clean SQL stack for JDBC, HTTP and internal 
use that make a single planner pass when possible, and use two passes for the 
JDBC prepare/execute model.
   
   ### Flaky Test: `DruidAvaticaHandlerTest.testConcurrentQueries()`
   
   Also fixes a flaky test: `DruidAvaticaHandlerTest.testConcurrentQueries()`. 
This test has occasionally failed in builds. The problem was first mentioned in 
2017 in [Issue #4408](https://github.com/apache/druid/issues/4408) , which was 
claimed to be fixed. When run in a tight loop, however, the test eventually 
fails with an NPE. The issue is traced to the context on the `DruidConnection`: 
it is passed to each statement, where it is shared and concurrently modified. 
The fix is to make a copy for each new statement which avoids queries trying to 
share the same map.
   
   Verified the problem by running the test 100 times in a tight loop. Verified 
the fix the same way.
   
   ### Minor Revisions
   
   Careful inspection of the code identified several other improvements:
   
   * Hold the yielder thread pool on the connection, not statement, so we get 
some benefit from caching. (It might be even better to define the pool 
globally.)
   * Define a `SqlRequest` to hold the quad of (SQL, context, parameters, auth 
result), which makes function signatures simpler.
   * Shifted some functionality from `DruidMeta` to `DruidConnection`. 
`DruidMeta` resolves connections, then connections work with statements.
   
   <hr>
   
   This PR has:
   
   - [X] been self-reviewed.
   - [X] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [X] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [X] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to