Re: [ANNOUNCE] Apache Arrow ADBC 0.1.0 Released

David Li Mon, 16 Jan 2023 16:01:03 -0800

Thanks for the reference. I feel like this must've been shared earlier but I 
missed it.


Another direction I mean to explore: implementing an Arrow Dataset backend 
using ADBC, so that we can feed SQL databases (and now Delta Lake) into 
(Py)Arrow Dataset, and then further into Acero (and the R package's dplyr 
bindings, ...). 

The other thing I'd be curious about is if we can generalize this subset of 
SQL/Substrait to drivers for other 'storage layers' like Apache Iceberg and 
Apache Hudi.

On Mon, Jan 16, 2023, at 17:53, Will Jones wrote:
>>
>> You could do something like what Matt Topol's done for Go
>>
>
> Thank you for the link! That's very similar to what I am thinking for Rust.
> I will look at that as a reference. :)
>
> What do you plan for a "query" to mean to the ADBC Delta Lake driver? Would
>> that be a subset of Substrait that gets mapped to a table scan (with
>> optional filter/selection)?
>>
>
> Reads are basically a Substrait read relation. Other queries like CREATE
> TABLE, DELETE, UPDATE are passed as simple SQL or Substrait queries. And
> then engines can use the driver as a sink (binding output data as a record
> batch stream) for INSERT, OVERWRITE, and MERGE operations. Further details
> are in the design doc [1].
>
> The audience is query engines that want to add Delta Lake support (read,
> write, modify) without getting deep into the details of the format and
> writer protocol. The latter is especially complex. Whereas a database like
> Postgres will validate new data and handle transaction logic, in Delta Lake
> that responsibility falls on each write.
>
> [1]
> https://docs.google.com/document/d/1ud-iBPg8VVz2N3HxySz9qbrffw6a9I7TiGZJ2MBs7ZE/edit?usp=sharing
>
>
> On Mon, Jan 16, 2023 at 2:26 PM David Li <[email protected]> wrote:
>
>> Exciting!
>>
>> You could do something like what Matt Topol's done for Go: define a native
>> Go API for ADBC, then a generic adapter to wrap any Go ADBC driver as a C
>> one. See [1]. As a bonus,  you can then have a more natural (and safe) API
>> for implementing the actual driver, and relegate the fiddly FFI bits to the
>> adapter.
>>
>> What do you plan for a "query" to mean to the ADBC Delta Lake driver?
>> Would that be a subset of Substrait that gets mapped to a table scan (with
>> optional filter/selection)?
>>
>> [1]: https://github.com/apache/arrow-adbc/pull/347
>>
>> On Mon, Jan 16, 2023, at 16:09, Will Jones wrote:
>> > Andrew and David,
>> >
>> > I'm starting to work on the ADBC connector for Delta Lake (in the
>> delta-rs
>> > repo) [1], written in Rust.
>> >
>> > I'm thinking there's some general code I can factor out to make it easier
>> > for Rust developers to create ADBC drivers. I've created an issue to
>> track
>> > that in the arrow-rs repo [2]. If there's anyone else planning on working
>> > with ADBC in Rust, I would be happy to collaborate.
>> >
>> > Best,
>> >
>> > Will Jones
>> >
>> > [1] https://github.com/delta-io/delta-rs/pull/945
>> > [2] https://github.com/apache/arrow-rs/issues/3540
>> >
>> > On Sun, Jan 15, 2023 at 5:33 AM Andrew Lamb <[email protected]>
>> wrote:
>> >
>> >> Thanks David -- I think currently the Rust implementation of
>> arrow-flight
>> >> and arrow-sql are being hammered out
>> >>
>> >> There are several projects that are working to implement FlightSQL in
>> >> various stages of completeness (I know of Ballista and IOx) and so I
>> expect
>> >> FlightSQL support to be better in arrow-rs over the next few months. As
>> >> part of that I expect we'll be using the integration tests and
>> contribute
>> >> back to other implementations as needed.
>> >>
>> >>
>> >>
>> >> On Sat, Jan 14, 2023 at 9:11 AM David Li <[email protected]> wrote:
>> >>
>> >> > Thanks Andrew! Several people helped, particularly Kou, Matt, and
>> Jacob,
>> >> > and this release also builds heavily on the nanoarrow project that
>> Dewey
>> >> is
>> >> > spearheading.
>> >> >
>> >> > I know Rust was neglected for this initial push, but I would like to
>> get
>> >> > around to that someday. (If you're interested, feel free to propose
>> >> > something or start a discussion. My Rust is too, well, rusty to put
>> >> forward
>> >> > a coherent proposal at the moment.)
>> >> >
>> >> > -David
>> >> >
>> >> > On Fri, Jan 13, 2023, at 16:00, Andrew Lamb wrote:
>> >> > > Thank you David and everyone else who helped make this happen --
>> really
>> >> > > nice work filling in the Arrow / Database integration story.
>> >> > >
>> >> > > Andrew
>> >> > >
>> >> > > On Tue, Jan 10, 2023 at 8:00 PM David Li <[email protected]>
>> wrote:
>> >> > >
>> >> > >> The Apache Arrow community is pleased to announce the 0.1.0
>> release of
>> >> > the
>> >> > >> Apache Arrow ADBC libraries. It includes 63 resolved GitHub issues
>> >> > ([1]).
>> >> > >>
>> >> > >> The release is available now from [2] and [3].
>> >> > >>
>> >> > >> Release notes are available at:
>> >> > >>
>> >> > >>
>> >> >
>> >>
>> https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-0.1.0/CHANGELOG.md
>> >> > >>
>> >> > >> What is Apache Arrow?
>> >> > >> ---------------------
>> >> > >> Apache Arrow is a columnar in-memory analytics layer designed to
>> >> > >> accelerate big data. It houses a set of canonical in-memory
>> >> > representations
>> >> > >> of flat and hierarchical data along with multiple language-bindings
>> >> for
>> >> > >> structure manipulation. It also provides low-overhead streaming and
>> >> > batch
>> >> > >> messaging, zero-copy interprocess communication (IPC), and
>> vectorized
>> >> > >> in-memory analytics libraries. Languages currently supported
>> include
>> >> C,
>> >> > >> C++, C#, Go, Java, JavaScript, Julia, MATLAB, Python, R, Ruby, and
>> >> Rust.
>> >> > >>
>> >> > >> What is Apache Arrow ADBC?
>> >> > >> --------------------------
>> >> > >> ADBC is a database access abstraction for Arrow-based
>> applications. It
>> >> > >> provides a cross-language API for working with databases while
>> using
>> >> > Arrow
>> >> > >> data, providing an alternative to APIs like JDBC and ODBC for
>> >> analytical
>> >> > >> applications. For more, see [4].
>> >> > >>
>> >> > >> Please report any feedback to the mailing lists ([5], [6]).
>> >> > >>
>> >> > >> Regards,
>> >> > >> The Apache Arrow Community
>> >> > >>
>> >> > >> [1]:
>> >> > >>
>> >> >
>> >>
>> https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A0.1.0+is%3Aclosed
>> >> > >> [2]:
>> >> > https://www.apache.org/dyn/closer.cgi/arrow/apache-arrow-adbc-0.1.0
>> >> > >> [3]: https://apache.jfrog.io/ui/native/arrow
>> >> > >> [4]:
>> https://arrow.apache.org/blog/2023/01/05/introducing-arrow-adbc/
>> >> > >> [5]: https://lists.apache.org/[email protected]
>> >> > >> [6]: https://lists.apache.org/[email protected]
>> >> > >>
>> >> >
>> >>
>>

Re: [ANNOUNCE] Apache Arrow ADBC 0.1.0 Released

Reply via email to