Re: [I] Proposal to implement ADBC driver for Apache Cassandra [arrow-adbc]

via GitHub Sun, 13 Oct 2024 12:25:57 -0700


paleolimbot commented on issue #2245:
URL: https://github.com/apache/arrow-adbc/issues/2245#issuecomment-2409095270


   > I'm personally most familiar with the Cassandra C/C++ driver as well as 
Arrow C++.
   
   > Matt also mentioned that there is now an ADBC [driver 
framework](https://github.com/apache/arrow-adbc/blob/main/c/driver/framework).
   
   I'm hoping to finish it this week, but there's a work-in-progress tutorial 
of how to get started building a driver in C++ using nanoarrow/the framework 
here! https://github.com/apache/arrow-adbc/pull/2186 . Arrow C++ presents a 
packaging problem (e.g., difficult/impossible to make an R driver wrapper, 
Python wrapper would require pinning a version of pyarrow until we sort out how 
to put two different Arrow C++ versions in the same process), which is why Matt 
probably recommended nanoarrow.
   
   > However, if there's good reason to implement the driver in a different 
language, I'm open to that and happy to get up to speed.
   
   It's a bit subjective, but all our existing drivers lean on the most 
arrowish SDK available for the driver (e.g., Postgres has libpq for C, so we 
implemented that in C++; Snowflake and BigQuery have Arrow integrations in 
their Go connectors, so we wrote those in Go). I have no idea what Cassandra 
provides, but if it had a fairly complete Go or Rust client already and nothing 
for C++ that might be a good reason to implement it in those languages. The 
fact that you know C++ and you're motivated counts for a lot, though!
   
   > Matt mentioned that, before implementing anything, it would be good to 
stand up a Cassandra node/cluster in CI
   
   We have some `docker compose` services for databases for this purposes. You 
could do a PR first that makes it so that we can do `docker compose up 
apache-cassandra-test`. (Since I think you would be a "first time contributor", 
this would also make it so that the PR where you actually implement the driver 
doesn't require one of us to OK the CI jobs after every push). (Apologies if I 
understand Cassandra too poorly and this is not a good fit!)
   
   > so we would likely to have to implement row ↔ column transposition on the 
client side (in the driver).
   
   The Postgres driver has an example of writing tests for this without a live 
connection to the database (the "copy" tests)
   
   > I'd love to hear any other considerations for implementing this ADBC driver
   
   Where to put it is a good thing to think about...ideally we'd (maybe just 
speaking for me here) like for ADBC connectors to live with the project instead 
of with us to spread out the maintenance load (e.g., like DuckDB), but there is 
also not a straightforward way to implement the validation suite outside this 
repository (or if there is, nobody has tried it yet!). Probably the easiest 
place to start is as a PR into apache/arrow-adbc and move it when we sort out 
those details.
   
   Feel free to ping me early and often as you get started (probably everybody 
else is game too, but I'll let the volunteer themselves 🙂 ). All of this is 
helpful for us too since we've all had build setups for ADBC since the 
beginning and forget the issues encountered by those new to the project.
   
   > Start implementing the driver along with integration tests?
   
   🚀 
   
   If I had to suggest a place to start it would be to get a "hello world" 
example running where you can open and close a connection to the database. Then 
you could perhaps follow that up with implementing the statement's ExecuteQuery 
for a case of a single type (int32 or string maybe). All just suggestsions (do 
your thing!)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Proposal to implement ADBC driver for Apache Cassandra [arrow-adbc]

Reply via email to