[GitHub] [druid] paul-rogers opened a new pull request, #12709: Fixes for the Avatica JDBC driver

GitBox Sun, 26 Jun 2022 00:09:20 -0700


paul-rogers opened a new pull request, #12709:
URL: https://github.com/apache/druid/pull/12709

PR #12636 amends Druid to make a single pass through the Calcite planner.
This was originally done in support of the catalog project to avoid resolving
the same catalog object multiple times, which could encounter race conditions
if the resolutions occurred on either side of a change to the catalog.

While discussing that change, @clintropolis called our attention to the
"query optimized" way that Druid handles parameters, which then raised
questions about how we ensure the proposed changes do not break our JDBC
support. @gianm suggested we focus testing on JDBC.

That testing revealed that Druid's implementation of the JDBC protocol is a
bit off, which may have led to some of the confusion around when to prepare and
when to expect parameters. [Issue
#12682](https://github.com/apache/druid/issues/12682) and [issue
#12684](https://github.com/apache/druid/issues/12684) spell out the gaps.

While JDBC has two kinds of statements (`Statement` and
`PreparedStatement`), Druid has just one that attempts to do what the two JDBC
statements do. That is, it tries to be both prepared and non-prepared.

Also, while JDBC separates the idea of statement and `ResultSet`, Druid
combines the two: closing the result set closes the statement as well. While
this might be handy, for a `Statement`, it does work at cross-purposes to the
`PreparedStatement` which is designed to be prepared once and executed multiple
times. Closing the statement after the first execution forces
`PreparedStatement` to be identical to `Statement` (except that the former
takes parameters.)

### Corrects the Druid JDBC Driver Behavior

We could argue about the merits of the Druid design. However, JDBC is a
standard and the correct behavior is spelled out in the [JDBC 4.1
Spec](https://download.oracle.com/otn-pub/jcp/jdbc-4_1-mrel-spec/jdbc4.1-fr-spec.pdf?AuthParam=1655858260_d6388aeec9aeb8e5616d221749b1ff71).
That behavior is:

* A `Statement` is a reusable object that can execute multiple SQL
statements, using a `ResultSet`. Closing the `ResultSet` does not close the
statement. Druid can, however, close the `ResultSet` on EOF.
* A `PreparedStatement` is a reusable object created with a (typically
parameterized) SQL statement. Each execution is given a set of parameter
values, and produces a `ResultSet`. Again, closing the `ResultSet` does not
close the statement, which allows the statement to be executed with a different
set of parameters.

To correct the Druid behavior, the Avatica JDBC implementation is refactored
to clearly model the JDBC concepts:

* `DruidJdbcStatement` models a (non-prepared) Statement.
* `DruidJdbcPreparedStatement` models a PreparedStatement.
* `AbstractDruidJdbcStatement` is the common base class for the above.
* `DruidJdbcResultSet` represents a result set (a running query).

While PR #12636 went on to modify the planner and the `SqlLifecyle` layers,
this PR leaves those layers unchanged, and implements the desired behavior with
the existing code. The hope is that the result is a bit easier on reviewers.
This code will be modified again later to support the single-pass planner.

The Avatica driver tests were modified to be consistent with the JDBC
standard. The original code didn't pass, but the revised code does.

The result is that we now have a clean SQL stack for JDBC, HTTP and internal
use that make a single planner pass when possible, and use two passes for the
JDBC prepare/execute model.

### Flaky Test: `DruidAvaticaHandlerTest.testConcurrentQueries()`

Also fixes a flaky test: `DruidAvaticaHandlerTest.testConcurrentQueries()`.
This test has occasionally failed in builds. The problem was first mentioned in
2017 in [Issue #4408](https://github.com/apache/druid/issues/4408) , which was
claimed to be fixed. When run in a tight loop, however, the test eventually
fails with an NPE. The issue is traced to the context on the `DruidConnection`:
it is passed to each statement, where it is shared and concurrently modified.
The fix is to make a copy for each new statement which avoids queries trying to
share the same map.

Verified the problem by running the test 100 times in a tight loop. Verified
the fix the same way.

### Minor Revisions

Careful inspection of the code identified several other improvements:

* Hold the yielder thread pool on the connection, not statement, so we get
some benefit from caching. (It might be even better to define the pool
globally.)
* Define a `SqlRequest` to hold the quad of (SQL, context, parameters, auth
result), which makes function signatures simpler.
* Shifted some functionality from `DruidMeta` to `DruidConnection`.
`DruidMeta` resolves connections, then connections work with statements.

<hr>

This PR has:

- [X] been self-reviewed.
- [X] added Javadocs for most classes and all non-trivial methods. Linked
related entities via Javadoc links.
- [X] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [X] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [ ] been tested in a test Druid cluster.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] paul-rogers opened a new pull request, #12709: Fixes for the Avatica JDBC driver

Reply via email to