Thanks for the reference. I feel like this must've been shared earlier but I missed it.
Another direction I mean to explore: implementing an Arrow Dataset backend using ADBC, so that we can feed SQL databases (and now Delta Lake) into (Py)Arrow Dataset, and then further into Acero (and the R package's dplyr bindings, ...). The other thing I'd be curious about is if we can generalize this subset of SQL/Substrait to drivers for other 'storage layers' like Apache Iceberg and Apache Hudi. On Mon, Jan 16, 2023, at 17:53, Will Jones wrote: >> >> You could do something like what Matt Topol's done for Go >> > > Thank you for the link! That's very similar to what I am thinking for Rust. > I will look at that as a reference. :) > > What do you plan for a "query" to mean to the ADBC Delta Lake driver? Would >> that be a subset of Substrait that gets mapped to a table scan (with >> optional filter/selection)? >> > > Reads are basically a Substrait read relation. Other queries like CREATE > TABLE, DELETE, UPDATE are passed as simple SQL or Substrait queries. And > then engines can use the driver as a sink (binding output data as a record > batch stream) for INSERT, OVERWRITE, and MERGE operations. Further details > are in the design doc [1]. > > The audience is query engines that want to add Delta Lake support (read, > write, modify) without getting deep into the details of the format and > writer protocol. The latter is especially complex. Whereas a database like > Postgres will validate new data and handle transaction logic, in Delta Lake > that responsibility falls on each write. > > [1] > https://docs.google.com/document/d/1ud-iBPg8VVz2N3HxySz9qbrffw6a9I7TiGZJ2MBs7ZE/edit?usp=sharing > > > On Mon, Jan 16, 2023 at 2:26 PM David Li <lidav...@apache.org> wrote: > >> Exciting! >> >> You could do something like what Matt Topol's done for Go: define a native >> Go API for ADBC, then a generic adapter to wrap any Go ADBC driver as a C >> one. See [1]. As a bonus, you can then have a more natural (and safe) API >> for implementing the actual driver, and relegate the fiddly FFI bits to the >> adapter. >> >> What do you plan for a "query" to mean to the ADBC Delta Lake driver? >> Would that be a subset of Substrait that gets mapped to a table scan (with >> optional filter/selection)? >> >> [1]: https://github.com/apache/arrow-adbc/pull/347 >> >> On Mon, Jan 16, 2023, at 16:09, Will Jones wrote: >> > Andrew and David, >> > >> > I'm starting to work on the ADBC connector for Delta Lake (in the >> delta-rs >> > repo) [1], written in Rust. >> > >> > I'm thinking there's some general code I can factor out to make it easier >> > for Rust developers to create ADBC drivers. I've created an issue to >> track >> > that in the arrow-rs repo [2]. If there's anyone else planning on working >> > with ADBC in Rust, I would be happy to collaborate. >> > >> > Best, >> > >> > Will Jones >> > >> > [1] https://github.com/delta-io/delta-rs/pull/945 >> > [2] https://github.com/apache/arrow-rs/issues/3540 >> > >> > On Sun, Jan 15, 2023 at 5:33 AM Andrew Lamb <al...@influxdata.com> >> wrote: >> > >> >> Thanks David -- I think currently the Rust implementation of >> arrow-flight >> >> and arrow-sql are being hammered out >> >> >> >> There are several projects that are working to implement FlightSQL in >> >> various stages of completeness (I know of Ballista and IOx) and so I >> expect >> >> FlightSQL support to be better in arrow-rs over the next few months. As >> >> part of that I expect we'll be using the integration tests and >> contribute >> >> back to other implementations as needed. >> >> >> >> >> >> >> >> On Sat, Jan 14, 2023 at 9:11 AM David Li <lidav...@apache.org> wrote: >> >> >> >> > Thanks Andrew! Several people helped, particularly Kou, Matt, and >> Jacob, >> >> > and this release also builds heavily on the nanoarrow project that >> Dewey >> >> is >> >> > spearheading. >> >> > >> >> > I know Rust was neglected for this initial push, but I would like to >> get >> >> > around to that someday. (If you're interested, feel free to propose >> >> > something or start a discussion. My Rust is too, well, rusty to put >> >> forward >> >> > a coherent proposal at the moment.) >> >> > >> >> > -David >> >> > >> >> > On Fri, Jan 13, 2023, at 16:00, Andrew Lamb wrote: >> >> > > Thank you David and everyone else who helped make this happen -- >> really >> >> > > nice work filling in the Arrow / Database integration story. >> >> > > >> >> > > Andrew >> >> > > >> >> > > On Tue, Jan 10, 2023 at 8:00 PM David Li <lidav...@apache.org> >> wrote: >> >> > > >> >> > >> The Apache Arrow community is pleased to announce the 0.1.0 >> release of >> >> > the >> >> > >> Apache Arrow ADBC libraries. It includes 63 resolved GitHub issues >> >> > ([1]). >> >> > >> >> >> > >> The release is available now from [2] and [3]. >> >> > >> >> >> > >> Release notes are available at: >> >> > >> >> >> > >> >> >> > >> >> >> https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-0.1.0/CHANGELOG.md >> >> > >> >> >> > >> What is Apache Arrow? >> >> > >> --------------------- >> >> > >> Apache Arrow is a columnar in-memory analytics layer designed to >> >> > >> accelerate big data. It houses a set of canonical in-memory >> >> > representations >> >> > >> of flat and hierarchical data along with multiple language-bindings >> >> for >> >> > >> structure manipulation. It also provides low-overhead streaming and >> >> > batch >> >> > >> messaging, zero-copy interprocess communication (IPC), and >> vectorized >> >> > >> in-memory analytics libraries. Languages currently supported >> include >> >> C, >> >> > >> C++, C#, Go, Java, JavaScript, Julia, MATLAB, Python, R, Ruby, and >> >> Rust. >> >> > >> >> >> > >> What is Apache Arrow ADBC? >> >> > >> -------------------------- >> >> > >> ADBC is a database access abstraction for Arrow-based >> applications. It >> >> > >> provides a cross-language API for working with databases while >> using >> >> > Arrow >> >> > >> data, providing an alternative to APIs like JDBC and ODBC for >> >> analytical >> >> > >> applications. For more, see [4]. >> >> > >> >> >> > >> Please report any feedback to the mailing lists ([5], [6]). >> >> > >> >> >> > >> Regards, >> >> > >> The Apache Arrow Community >> >> > >> >> >> > >> [1]: >> >> > >> >> >> > >> >> >> https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A0.1.0+is%3Aclosed >> >> > >> [2]: >> >> > https://www.apache.org/dyn/closer.cgi/arrow/apache-arrow-adbc-0.1.0 >> >> > >> [3]: https://apache.jfrog.io/ui/native/arrow >> >> > >> [4]: >> https://arrow.apache.org/blog/2023/01/05/introducing-arrow-adbc/ >> >> > >> [5]: https://lists.apache.org/list.html?u...@arrow.apache.org >> >> > >> [6]: https://lists.apache.org/list.html?dev@arrow.apache.org >> >> > >> >> >> > >> >> >>