[GitHub] [druid] paul-rogers opened a new pull request, #13196: Async reads for JDBC

GitBox Sat, 08 Oct 2022 21:41:27 -0700


paul-rogers opened a new pull request, #13196:
URL: https://github.com/apache/druid/pull/13196


   Prevents JDBC timeouts on long queries by returning empty batches when a 
batch fetch takes too long. Uses an async model to run the result fetch 
concurrently with JDBC requests.
   
   ### Release Notes
   
   Druid's Avatica-based JDBC handler is not the preferred way to run 
long-running queries in Druid.
   
   Druid supports the Avatica JDBC driver. Avatica uses HTTP requests to 
communicate with the server. When using JDBC with long-running queries, the 
HTTP request can time out, producing an error. This PR uses an internal feature 
of JDBC to avoid timeouts by returning "empty batches" of rows when Druid takes 
too long to return the actual rows. The JDBC client automatically requests more 
rows, resulting in Druid queries running asynchronously with JDBC requests. The 
result is that JDBC queries no longer time out.
   
   This feature is enabled by default with a timeout of 5 seconds. You can 
modify the time out by changing the `druid.sql.avatica.fetchTimeoutMs` property 
to the new timeout. Specify the timeout in milliseconds. Druid enforces a 
minimum of 1000 milliseconds to prevent hammering of the Broker.
   
   ### Description
   
   Druid's Avatica handler implementation already uses async execution to open 
and close the "yielder" for a query. This PR extends the idea by using the same 
executor for fetches. The fetch state (yielder, fetch offset) resides in a new 
`ResultFetcher` class that is invoked on each request to get results.
   
   The async logic is simple:
   
   * If a future exists from the previous fetch exists, use it.
   * Otherwise, invoke the fetcher using the existing `ExecutorService` which 
returns a future.
   * Wait for the future to finish, but only up to the fetch timeout.
   * If the wait times out, save the future and return an empty batch.
   * If the results arrive in time, return them.
   
   The close operation is modified to wait for any in-flight fetch to return 
before shutting down the result set. A `ResultFetcherFactory` creates the 
fetcher for each query. The factory is needed only to allow introducing 
artificial delays for testing.
   
   The Avatica handler tests are cleaned up to eliminate some redundancy. This 
then allowed tests for async to be created with less copy/paste than with the 
existing code.
   
   <hr>
   
   This PR has:
   - [X] been self-reviewed.
      - [X] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [X] added documentation for new or modified features or behaviors.
   - [X] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [X] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [X] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] paul-rogers opened a new pull request, #13196: Async reads for JDBC

Reply via email to