paleolimbot commented on issue #2245: URL: https://github.com/apache/arrow-adbc/issues/2245#issuecomment-2409095270
> I'm personally most familiar with the Cassandra C/C++ driver as well as Arrow C++. > Matt also mentioned that there is now an ADBC [driver framework](https://github.com/apache/arrow-adbc/blob/main/c/driver/framework). I'm hoping to finish it this week, but there's a work-in-progress tutorial of how to get started building a driver in C++ using nanoarrow/the framework here! https://github.com/apache/arrow-adbc/pull/2186 . Arrow C++ presents a packaging problem (e.g., difficult/impossible to make an R driver wrapper, Python wrapper would require pinning a version of pyarrow until we sort out how to put two different Arrow C++ versions in the same process), which is why Matt probably recommended nanoarrow. > However, if there's good reason to implement the driver in a different language, I'm open to that and happy to get up to speed. It's a bit subjective, but all our existing drivers lean on the most arrowish SDK available for the driver (e.g., Postgres has libpq for C, so we implemented that in C++; Snowflake and BigQuery have Arrow integrations in their Go connectors, so we wrote those in Go). I have no idea what Cassandra provides, but if it had a fairly complete Go or Rust client already and nothing for C++ that might be a good reason to implement it in those languages. The fact that you know C++ and you're motivated counts for a lot, though! > Matt mentioned that, before implementing anything, it would be good to stand up a Cassandra node/cluster in CI We have some `docker compose` services for databases for this purposes. You could do a PR first that makes it so that we can do `docker compose up apache-cassandra-test`. (Since I think you would be a "first time contributor", this would also make it so that the PR where you actually implement the driver doesn't require one of us to OK the CI jobs after every push). (Apologies if I understand Cassandra too poorly and this is not a good fit!) > so we would likely to have to implement row ↔ column transposition on the client side (in the driver). The Postgres driver has an example of writing tests for this without a live connection to the database (the "copy" tests) > I'd love to hear any other considerations for implementing this ADBC driver Where to put it is a good thing to think about...ideally we'd (maybe just speaking for me here) like for ADBC connectors to live with the project instead of with us to spread out the maintenance load (e.g., like DuckDB), but there is also not a straightforward way to implement the validation suite outside this repository (or if there is, nobody has tried it yet!). Probably the easiest place to start is as a PR into apache/arrow-adbc and move it when we sort out those details. Feel free to ping me early and often as you get started (probably everybody else is game too, but I'll let the volunteer themselves 🙂 ). All of this is helpful for us too since we've all had build setups for ADBC since the beginning and forget the issues encountered by those new to the project. > Start implementing the driver along with integration tests? 🚀 If I had to suggest a place to start it would be to get a "hello world" example running where you can open and close a connection to the database. Then you could perhaps follow that up with implementing the statement's ExecuteQuery for a case of a single type (int32 or string maybe). All just suggestsions (do your thing!) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
