Having kicked the tyres on this idea, I can report that it works
nicely. I went one step further and made the default idle pool size 0,
rather than 1, which has a side benefit that Drill does not try to
connect out when it starts up at all, only upon receiving the first
query (and then HikariCP caches that connection for some amount of
time). The advantage here is that if Drill gets restarted in the middle
of the night when some JDBC data source happens not to be available,
that doesn't kick the storage config into the disabled state.
When I send in a rapid spate of queries, the HikariCP pool grows
accordingly, up to the configured max.
On 2021/10/19 06:42, James Turton wrote:
Hi devs
I'd like to propose a change to the defaults for our outbound
connection pool management, at least for JDBC but perhaps ultimately
wherever we can manage it. Currently we are eager about initiating
outbound JDBC connections, bringing up 10 per storage config per
drillbit. For example, if a user creates 3 storage configs pointing
to a single DBMS (the configs differing in their DB path and
credentials, say) on a cluster of 5 drillbits then we'll bring up
10x3x5 = 150 connections as soon as we can and try to keep them up
permanently. The fixed pool size of 10 is a default we picked up from
HikariCP which surely set it with application servers in mind.
We've had a report from the field of a MySQL server declining to
provide said 150 connections, leaving the Drill user unable to
proceed. Additionally, as you can imagine, almost all 150 connections
will be idle most of the time for typical Drill cluster workloads.
Furthermore, while connections pools are ubiquitous in the OLTP world
they are rare in the OLAP world where the cost of creating and
destroying them is negligible compared to the cost of a single user
query, while the benefits of per-user access control, resource
management and session management which they bring over shared pools
are valuable. Bringing these latter benefits to Drill's outbound JDBC
connections is not in the scope of this email, the point made is in
only "traditionally, OLAP environments have avoided connection pools
because the losses far outweigh the gains".
In light of the above I suggest that we transition from eager to lazy
outbound JDBC connections, more like Apache Spark (I'm told). I
propose initially that we only change our *default* HikariCP
configuration to maintain small, finitely scalable pools (e.g.
baseline 1, up to 10) instead of fixed pools. The HikariCP
configuration is already overridable today for users that prefer the
current eager connection behaviour.
James