Re: Connection Persistence (Cassandra Storage Plugin)

Charles Givre Sun, 19 Jan 2020 15:27:10 -0800

Hey Paul, 
I was messing with the Cassandra plugin last night and moved the connection 
logic to the actual StoragePlugin class.  This, combined with a few null checks 
seemed to do the trick as queries are now virtually instantaneous!


What remains to be done:
1.  Filter pushdown not working:  I'm going to wait until the Base Storage PR 
is committed and attempt to use that.  This plugin seems like a really obvious 
candidate for that.  I've been digging around to see if I can figure out how to 
use the Calcite adapters and there is nothing out there WRT documentation or 
example code.  I saw that the Drill JDBC storage plugin uses the Calcite 
adapter so I may try to follow that model.

2.  Fix data types:  Right now, the plugin returns everything as a string.  
Obviously, that needs to get fixed, so I'll need to rewrite the RecordReader 
class to use EVF. 
3.  Other push downs:  This seems like a really good candidate for Limit and 
Aggregate push downs as well.  If I can figure out how to do that and/or use 
the Calcite adapter to do so, I'll work on that.  
4.  Write documentation and additional configuration options:

If we can get the Base Storage PR committed, my goal is to get this ready for 
Drill 1.18. This may be a bit of a stretch, but we'll see.  If anyone is 
interested, here is a link to my branch[1].  Feedback is definitely 
appreciated, but in no way is this ready for code review.
Best,
-- C


[1]: https://github.com/cgivre/drill/tree/storage-cassandra 
<https://github.com/cgivre/drill/tree/storage-cassandra>



> On Jan 17, 2020, at 5:37 PM, Paul Rogers <[email protected]> wrote:
> 
> Hi Charles,
> 
> Poked around a bit. Turns out that the Cassandra client seems to work a bit 
> differently than a JDBC client. From the JavaDoc page: "Session instances are 
> thread-safe and usually a single instance is enough per application." Given 
> this, you might be able to cache a single connection (per keyspace) to be 
> shared by the planner and scans. [1]
> 
> Still need some global object to open, maintain and close the connection, so 
> something would have to be added to Drill to support this.
> 
> JDBC is harder to work with because connection access must be serialized: 
> only one thread can use the connection at a time. More to the point, 
> transactions must be serialized; JDBC can't support two separate connections 
> on a single JDBC connection.
> 
> 
> Thanks,
> - Paul
> 
> 
> [1] 
> https://docs.datastax.com/en/drivers/java/2.0/com/datastax/driver/core/Session.html
> 
> 
> 
> 
>    On Friday, January 17, 2020, 04:56:39 AM PST, Charles Givre 
> <[email protected]> wrote:  
> 
> Hello Drill Devs
> I have a question for you.  I'm working on a storage plugin for Apache 
> Cassandra.  I've got the queries mostly working, but I have a question.  
> Connections to Cassandra are meant to be opened once and remain open and so 
> they are "heavy".  It takes about 2 seconds to connect to the Cassandra 
> instance on my local machine.  Once the connection happens, the queries are 
> very fast.  I'm wondering is there a way to open the connection once and have 
> it persist somehow so that we don't have that overhead for each query?
> 
> I seem to recall a similar discussion for the JDBC storage plugin.
> Thanks,
> -- C

Re: Connection Persistence (Cassandra Storage Plugin)

Reply via email to