Hi, Thanks for sharing the current status! I understand.
BTW, can I add GLib/Ruby bindings to apache/arrow-adbc before we release the first version? (I want to use ADBC from Ruby.) Or should I wait for the first release? If I can work on it now, I'll open pull requests for it. Thanks, -- kou In <8703efd9-51bd-4f91-b550-73830667d...@www.fastmail.com> "Re: [DISC] Improving Arrow's database support" on Fri, 26 Aug 2022 11:03:26 -0400, "David Li" <lidav...@apache.org> wrote: > Thank you Kou! > > At least initially, I don't think I'll be able to complete the Dataset > integration in time. So 10.0.0 probably won't ship with a hard dependency. > That said I am hoping to have PyArrow take an optional dependency (so Flight > SQL can finally be available from Python). > > On Fri, Aug 26, 2022, at 01:01, Sutou Kouhei wrote: >> Hi, >> >> As a maintainer of Linux packages, I want apache/arrow-adbc >> to be released before apache/arrow is released so that >> apache/arrow's .deb/.rpm can depend on apache/arrow-adbc's >> .deb/.rpm. >> >> (If Apache Arrow Dataset uses apache/arrow-adbc, >> apache/arrow's .deb/.rpm needs to depend on >> apache/arrow-adbc's .deb/.rpm.) >> >> We can add .deb/.rpm related files >> (dev/tasks/linux-packages/ in apache/arrow) to >> apache/arrow-adbc to build .deb/.rpm for apache/arrow-adbc. >> >> FYI: I did it for datafusion-contrib/datafusion-c: >> >> * https://github.com/datafusion-contrib/datafusion-c/tree/main/package >> * >> https://github.com/datafusion-contrib/datafusion-c/blob/main/.github/workflows/package.yaml >> >> I can work on it in apache/arrow-adbc. >> >> >> Thanks, >> -- >> kou >> >> In <5cbf2923-4fb4-4c5e-b11d-007209fdd...@www.fastmail.com> >> "Re: [DISC] Improving Arrow's database support" on Thu, 25 Aug 2022 >> 11:51:08 -0400, >> "David Li" <lidav...@apache.org> wrote: >> >>> Fair enough, thank you. I'll try to expand a bit. (Sorry for the wall of >>> text that follows…) >>> >>> These are the components: >>> >>> - Core adbc.h header >>> - Driver manager for C/C++ >>> - Flight SQL-based driver >>> - Postgres-based driver (WIP) >>> - SQLite-based driver (more of a testbed for me than an actual component - >>> I don't think we'd actually distribute this) >>> - Java core interfaces >>> - Java driver manager >>> - Java JDBC-based driver >>> - Java Flight SQL-based driver >>> - Python driver manager >>> >>> I think: adbc.h gets mirrored into the Arrow repo. The Flight SQL drivers >>> get moved to the main Arrow repo and distributed as part of the regular >>> Arrow releases. >>> >>> For the rest of the components: they could be packaged individually, but >>> versioned and released together. Also, each C/C++ driver probably needs a >>> corresponding Python package so Python users do not have to futz with >>> shared library configurations. (See [1].) So for instance, installing >>> PyArrow would also give you the Flight SQL driver, and `pip install >>> adbc_postgres` would get you the Postgres-based driver. >>> >>> That would mean setting up separate CI, release, etc. (and eventually >>> linking Crossbow & Conbench as well?). That does mean duplication of >>> effort, but the trade off is avoiding bloating the main release process >>> even further. However, I'd like to hear from those closer to the release >>> process on this subject - if it would make people's lives easier, we could >>> merge everything into one repo/process. >>> >>> Integrations would be distributed as part of their respective packages >>> (e.g. Arrow Dataset would optionally link to the driver manager). So the >>> "part of Arrow 10.0.0" aspect means having a stable interface for adbc.h, >>> and getting the Flight SQL drivers into the main repo. >>> >>> [1]: https://github.com/apache/arrow-adbc/issues/53 >>> >>> On Thu, Aug 25, 2022, at 11:34, Antoine Pitrou wrote: >>>> On Fri, 19 Aug 2022 14:09:44 -0400 >>>> "David Li" <lidav...@apache.org> wrote: >>>>> Since it's been a while, I'd like to give an update. There are also a few >>>>> questions I have around distribution. >>>>> >>>>> Currently: >>>>> - Supported in C, Java, and Python. >>>>> - For C/Python, there are basic drivers wrapping Flight SQL and SQLite, >>>>> with a draft of a libpq (Postgres) driver (using nanoarrow). >>>>> - For Java, there are drivers wrapping JDBC and Flight SQL. >>>>> - For Python, there's low-level bindings to the C API, and the DBAPI >>>>> interface on top of that (+a few extension methods resembling >>>>> DuckDB/Turbodbc). >>>>> >>>>> There's drafts of integration with Ibis [1], DBI (R), and DuckDB. (I'd >>>>> like to thank Hannes and Kirill for their comments, as well as Antoine, >>>>> Dewey, and Matt here.) >>>>> >>>>> I'd like to have this as part of 10.0.0 in some fashion. However, I'm not >>>>> sure how we would like to handle packaging and distribution. In >>>>> particular, there are several sub-components for each language (the >>>>> driver manager + the drivers), increasing the work. Any thoughts here? >>>> >>>> Sorry, forgot to answer here. But I think your question is too broadly >>>> formulated. It probably deserves a case-by-case discussion, IMHO. >>>> >>>>> I'm also wondering how we want to handle this in terms of specification - >>>>> I assume we'd consider the core header file/Java interfaces a spec like >>>>> the C Data Interface/Flight RPC, and vote on them/mirror them into the >>>>> format/ directory? >>>> >>>> That sounds like the right way to me indeed. >>>> >>>> Regards >>>> >>>> Antoine.