clintropolis opened a new pull request #10880: URL: https://github.com/apache/druid/pull/10880
### Description This PR adds a new config, `druid.sql.avatica.minRowsPerFrame` as a counterpart to `druid.sql.avatica.maxRowsPerFrame`, to give cluster operators some control over the size of frames (and so indirect control over the overall number of fetch requests required), as large result sets can require many thousands of frames to transfer at the default size of 100. Related, this config also serves as a sort of workaround to a missing feature (or bug?) of Avatica, described in https://issues.apache.org/jira/browse/CALCITE-2322, where the `fetchSize` parameter of a `Statement` or `PreparedStatement` are ignored and hard coded to use a default value of 100. There is an associated PR that is a couple of years old, https://github.com/apache/calcite-avatica/pull/49; with some minor fix-up to make things build I was able to locally patch my Druid and confirm it fixes the issue and allows us to pick up the user set `fetchSize` set on JDBC queries on our end. That PR seems to have lost momentum though, so I'm going to try to see if I can pick it up and get that issue fixed. This configuration is still useful even if the Avatica issue is resolved, because that fix allows JDBC users to give hints to the size of result sets that they would like, but this PR gives operators control to prevent very small fetch sizes with large result sets from choking things with a large number of fetch requests, or generally optimally tune default jdbc result sizes for typical workloads and override user requests. Frame size does have implications for JDBC performance as well. Using a simple utility, I did some tests with 500k total results using various fetch sizes (with the patched Avatica, but the same holds if done via the new server config). ``` $ ./druid-jdbc prepared-statement --jdbc=jdbc:avatica:remote:url=http://localhost:8082/druid/v2/sql/avatica/ --path=./test.sql --fetch=100 > out.csv Results processed: 2971 millis $ ./druid-jdbc prepared-statement --jdbc=jdbc:avatica:remote:url=http://localhost:8082/druid/v2/sql/avatica/ --path=./test.sql --fetch=1000 > out.csv Results processed: 1479 millis $ ./druid-jdbc prepared-statement --jdbc=jdbc:avatica:remote:url=http://localhost:8082/druid/v2/sql/avatica/ --path=./test.sql --fetch=10000 > out.csv Results processed: 1166 millis $ ./druid-jdbc prepared-statement --jdbc=jdbc:avatica:remote:url=http://localhost:8082/druid/v2/sql/avatica/ --path=./test.sql --fetch=100000 > out.csv Results processed: 2221 millis $ ./druid-jdbc prepared-statement --jdbc=jdbc:avatica:remote:url=http://localhost:8082/druid/v2/sql/avatica/ --path=./test.sql --fetch=500000 > out.txt Results processed: 2184 millis ``` There does appear to be a sort of goldilocks zone of not too small not too big, but that might vary per query and/or overall result set size, I haven't really played with it much. Going through a router seems to exaggerate the overhead of small frame sizes as well: ``` $ ./druid-jdbc prepared-statement --jdbc=jdbc:avatica:remote:url=http://localhost:8888/druid/v2/sql/avatica/ --path=./test.sql --fetch=100 > out.csv Results processed: 7394 millis $ ./druid-jdbc prepared-statement --jdbc=jdbc:avatica:remote:url=http://localhost:8082/druid/v2/sql/avatica/ --path=./test.sql --fetch=100 > out.csv Results processed: 2219 millis $ ./druid-jdbc prepared-statement --jdbc=jdbc:avatica:remote:url=http://localhost:8888/druid/v2/sql/avatica/ --path=./test.sql --fetch=10000 > out.csv Results processed: 1518 millis $ ./druid-jdbc prepared-statement --jdbc=jdbc:avatica:remote:url=http://localhost:8082/druid/v2/sql/avatica/ --path=./test.sql --fetch=10000 > out.csv Results processed: 1176 millis ``` <hr> This PR has: - [x] been self-reviewed. - [x] added documentation for new or modified features or behaviors. - [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met. - [ ] added integration tests. - [x] been tested in a test Druid cluster. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
