[GitHub] [druid] clintropolis opened a new pull request #10880: add druid jdbc handler config for minimum number of rows per frame

GitBox Thu, 11 Feb 2021 04:28:20 -0800


clintropolis opened a new pull request #10880:
URL: https://github.com/apache/druid/pull/10880



   ### Description
   This PR adds a new config, `druid.sql.avatica.minRowsPerFrame` as a 
counterpart to `druid.sql.avatica.maxRowsPerFrame`, to give cluster operators 
some control over the size of frames (and so indirect control over the overall 
number of fetch requests required), as large result sets can require many 
thousands of frames to transfer at the default size of 100.
   
   Related, this config also serves as a sort of workaround to a missing 
feature (or bug?) of Avatica, described in 
https://issues.apache.org/jira/browse/CALCITE-2322, where the `fetchSize` 
parameter of a `Statement` or `PreparedStatement` are ignored and hard coded to 
use a default value of 100. There is an associated PR that is a couple of years 
old, https://github.com/apache/calcite-avatica/pull/49; with some minor fix-up 
to make things build I was able to locally patch my Druid and confirm it fixes 
the issue and allows us to pick up the user set `fetchSize` set on JDBC queries 
on our end. That PR seems to have lost momentum though, so I'm going to try to 
see if I can pick it up and get that issue fixed.
   
   This configuration is still useful even if the Avatica issue is resolved, 
because that fix allows JDBC users to give hints to the size of result sets 
that they would like, but this PR gives operators control to prevent very small 
fetch sizes with large result sets from choking things with a large number of 
fetch requests, or generally optimally tune default jdbc result sizes for 
typical workloads and override user requests.
   
   Frame size does have implications for JDBC performance as well. Using a 
simple utility, I did some tests with 500k total results using various fetch 
sizes (with the patched Avatica, but the same holds if done via the new server 
config).
   
   ```
   $ ./druid-jdbc prepared-statement 
--jdbc=jdbc:avatica:remote:url=http://localhost:8082/druid/v2/sql/avatica/ 
--path=./test.sql --fetch=100 > out.csv
   Results processed: 2971 millis
   $ ./druid-jdbc prepared-statement 
--jdbc=jdbc:avatica:remote:url=http://localhost:8082/druid/v2/sql/avatica/ 
--path=./test.sql --fetch=1000 > out.csv
   Results processed: 1479 millis
   $ ./druid-jdbc prepared-statement 
--jdbc=jdbc:avatica:remote:url=http://localhost:8082/druid/v2/sql/avatica/ 
--path=./test.sql --fetch=10000 > out.csv
   Results processed: 1166 millis
   $ ./druid-jdbc prepared-statement 
--jdbc=jdbc:avatica:remote:url=http://localhost:8082/druid/v2/sql/avatica/ 
--path=./test.sql --fetch=100000 > out.csv
   Results processed: 2221 millis
   $ ./druid-jdbc prepared-statement 
--jdbc=jdbc:avatica:remote:url=http://localhost:8082/druid/v2/sql/avatica/ 
--path=./test.sql --fetch=500000 > out.txt
   Results processed: 2184 millis
   ```
   
   There does appear to be a sort of goldilocks zone of not too small not too 
big, but that might vary per query and/or overall result set size, I haven't 
really played with it much.
   
   Going through a router seems to exaggerate the overhead of small frame sizes 
as well:
   
   ```
   $ ./druid-jdbc prepared-statement 
--jdbc=jdbc:avatica:remote:url=http://localhost:8888/druid/v2/sql/avatica/ 
--path=./test.sql --fetch=100 > out.csv
   Results processed: 7394 millis
   $ ./druid-jdbc prepared-statement 
--jdbc=jdbc:avatica:remote:url=http://localhost:8082/druid/v2/sql/avatica/ 
--path=./test.sql --fetch=100 > out.csv
   Results processed: 2219 millis
   $ ./druid-jdbc prepared-statement 
--jdbc=jdbc:avatica:remote:url=http://localhost:8888/druid/v2/sql/avatica/ 
--path=./test.sql --fetch=10000 > out.csv
   Results processed: 1518 millis
   $ ./druid-jdbc prepared-statement 
--jdbc=jdbc:avatica:remote:url=http://localhost:8082/druid/v2/sql/avatica/ 
--path=./test.sql --fetch=10000 > out.csv
   Results processed: 1176 millis
   ```
   
   <hr>
   
   This PR has:
   - [x] been self-reviewed.
   - [x] added documentation for new or modified features or behaviors.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [x] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [x] been tested in a test Druid cluster.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] clintropolis opened a new pull request #10880: add druid jdbc handler config for minimum number of rows per frame

Reply via email to