Hey Paul, I was messing with the Cassandra plugin last night and moved the connection logic to the actual StoragePlugin class. This, combined with a few null checks seemed to do the trick as queries are now virtually instantaneous!
What remains to be done: 1. Filter pushdown not working: I'm going to wait until the Base Storage PR is committed and attempt to use that. This plugin seems like a really obvious candidate for that. I've been digging around to see if I can figure out how to use the Calcite adapters and there is nothing out there WRT documentation or example code. I saw that the Drill JDBC storage plugin uses the Calcite adapter so I may try to follow that model. 2. Fix data types: Right now, the plugin returns everything as a string. Obviously, that needs to get fixed, so I'll need to rewrite the RecordReader class to use EVF. 3. Other push downs: This seems like a really good candidate for Limit and Aggregate push downs as well. If I can figure out how to do that and/or use the Calcite adapter to do so, I'll work on that. 4. Write documentation and additional configuration options: If we can get the Base Storage PR committed, my goal is to get this ready for Drill 1.18. This may be a bit of a stretch, but we'll see. If anyone is interested, here is a link to my branch[1]. Feedback is definitely appreciated, but in no way is this ready for code review. Best, -- C [1]: https://github.com/cgivre/drill/tree/storage-cassandra <https://github.com/cgivre/drill/tree/storage-cassandra> > On Jan 17, 2020, at 5:37 PM, Paul Rogers <[email protected]> wrote: > > Hi Charles, > > Poked around a bit. Turns out that the Cassandra client seems to work a bit > differently than a JDBC client. From the JavaDoc page: "Session instances are > thread-safe and usually a single instance is enough per application." Given > this, you might be able to cache a single connection (per keyspace) to be > shared by the planner and scans. [1] > > Still need some global object to open, maintain and close the connection, so > something would have to be added to Drill to support this. > > JDBC is harder to work with because connection access must be serialized: > only one thread can use the connection at a time. More to the point, > transactions must be serialized; JDBC can't support two separate connections > on a single JDBC connection. > > > Thanks, > - Paul > > > [1] > https://docs.datastax.com/en/drivers/java/2.0/com/datastax/driver/core/Session.html > > > > > On Friday, January 17, 2020, 04:56:39 AM PST, Charles Givre > <[email protected]> wrote: > > Hello Drill Devs > I have a question for you. I'm working on a storage plugin for Apache > Cassandra. I've got the queries mostly working, but I have a question. > Connections to Cassandra are meant to be opened once and remain open and so > they are "heavy". It takes about 2 seconds to connect to the Cassandra > instance on my local machine. Once the connection happens, the queries are > very fast. I'm wondering is there a way to open the connection once and have > it persist somehow so that we don't have that overhead for each query? > > I seem to recall a similar discussion for the JDBC storage plugin. > Thanks, > -- C
