Hi Paul, 
I did take a look at the JDBC plugin and it appeared that the code to create 
the actual connection was here:  

https://github.com/apache/drill/blob/e1d4d511da553c5aa82270d9db5bb3bb0519cc17/contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcStoragePlugin.java#L112-#L136

From the comments it seemed to suggest that the connection was opened when the 
plugin was created.  It's an interesting situation because when I queried 
cassandra, it took about 4 seconds to get results into Drill even though the 
Cassandra returned the results virtually instantly. When I stepped through the 
code, I found that it was taking 2 seconds to create the connection (which was 
happening twice.., more debugging needed on my part). If the connection could 
be created when the storage plugin is enabled or on Drill startup, this thing 
would be super fast! 

Assuming that's not the case with the JDBC where would a connection pool need 
to be instantiated?

-- C




> On Jan 17, 2020, at 1:43 PM, Paul Rogers <[email protected]> wrote:
> 
> Hi Charles,
> 
> I've seen nothing like this in my travels through Drill code. My guess is 
> that you'd have to create a connection pool. I'd also guess that connection 
> pool implementations exist that could be reused.
> 
> Drill is multi-threaded: any one Drillbit could be running many concurrent 
> Cassandra scans, so the pool would want to open connections as needed, then 
> perhaps close them after being unused for a time. Depending on the Cassandra 
> semantics, some thread may have to do work to keep idle connections alive.
> 
> Most execution-related code in Drill is designed to be shared-nothing; this 
> would be the first (or one of the few) instances where fragments and/or 
> queries must coordinate.
> 
> Arina mentioned a JIRA ticket in the context of the JDBC storage plugin 
> discussion. Is anyone currently working on this issue?
> 
> Thanks,
> - Paul
> 
> 
> 
>    On Friday, January 17, 2020, 04:56:39 AM PST, Charles Givre 
> <[email protected]> wrote:  
> 
> Hello Drill Devs
> I have a question for you.  I'm working on a storage plugin for Apache 
> Cassandra.  I've got the queries mostly working, but I have a question.  
> Connections to Cassandra are meant to be opened once and remain open and so 
> they are "heavy".  It takes about 2 seconds to connect to the Cassandra 
> instance on my local machine.  Once the connection happens, the queries are 
> very fast.  I'm wondering is there a way to open the connection once and have 
> it persist somehow so that we don't have that overhead for each query?
> 
> I seem to recall a similar discussion for the JDBC storage plugin.
> Thanks,
> -- C

Reply via email to