Hi devs

I'd like to propose a change to the defaults for our outbound connection pool management, at least for JDBC but perhaps ultimately wherever we can manage it.  Currently we are eager about initiating outbound JDBC connections, bringing up 10 per storage config per drillbit.  For example, if a user creates 3 storage configs pointing to a single DBMS (the configs differing in their DB path and credentials, say) on a cluster of 5 drillbits then we'll bring up 10x3x5 = 150 connections as soon as we can and try to keep them up permanently.  The fixed pool size of 10 is a default we picked up from HikariCP which surely set it with application servers in mind.

We've had a report from the field of a MySQL server declining to provide said 150 connections, leaving the Drill user unable to proceed.  Additionally, as you can imagine, almost all 150 connections will be idle most of the time for typical Drill cluster workloads.  Furthermore, while connections pools are ubiquitous in the OLTP world they are rare in the OLAP world where the cost of creating and destroying them is negligible compared to the cost of a single user query, while the benefits of per-user access control, resource management and session management which they bring over shared pools are valuable.  Bringing these latter benefits to Drill's outbound JDBC connections is not in the scope of this email, the point made is in only "traditionally, OLAP environments have avoided connection pools because the losses far outweigh the gains".

In light of the above I suggest that we transition from eager to lazy outbound JDBC connections, more like Apache Spark (I'm told). I propose initially that we only change our *default* HikariCP configuration to maintain small, finitely scalable pools (e.g. baseline 1, up to 10) instead of fixed pools.  The HikariCP configuration is already overridable today for users that prefer the current eager connection behaviour.

James

Reply via email to