I put up [1] as the PR to apache/arrow to vote on. There is a bit of a circular dependency here: my thought is that we will vote on this, then tag the 1.0.0 API standard on apache/arrow-adbc, and finally update the PR before merging. But actual releases of the packages may be a later commit/tag as we set up all the necessary infrastructure.
I'll start a vote thread soon unless there are comments/concerns. Also, I plan to make a ticket to INFRA for apache/arrow-adbc, to switch the default commit message to "PR title + description" [2] to go along with the conventional commit suggestion, unless anyone has other ideas. In other words, I'm trying to set up the Flight SQL driver now [3] which will give us actual Python bindings (this adds an optional runtime dependency from PyArrow to ADBC); I would like to get back to the libpq driver [4] and set up benchmarks and start trying to compare it to other alternatives (pgeon, psycopg, etc.) [1]: https://github.com/apache/arrow/pull/14079 [2]: https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/configuring-pull-request-merges/configuring-commit-squashing-for-pull-requests [3]: https://github.com/apache/arrow/pull/14082 On Tue, Sep 13, 2022, at 15:12, David Li wrote: > Ah, thanks for the clarification Neal! > > Jacob/Matt: I put up https://github.com/apache/arrow-adbc/pull/124 to > describe the convention but I wonder if we should partition components > more granularly than we have so far. > > On Mon, Sep 12, 2022, at 12:57, Neal Richardson wrote: >> On Mon, Sep 12, 2022 at 12:44 PM David Li <lidav...@apache.org> wrote: >> >>> I like this idea. I would also like to set up some sort of automated ABI >>> checker as well (the options I found were GPL/LGPL so I need to figure out >>> how to proceed). >>> >> >> You should be able to use GPL software in CI, that's no problem. You can >> even depend on GPL software as long as it is "optional": >> https://www.apache.org/legal/resolved.html#optional But this would not even >> count as that since the ABI checker wouldn't be required to use the >> software. >> >> Neal >> >> >>> >>> I can put up a PR later that formalizes these guidelines in >>> CONTRIBUTING.md. It looks like there's a pre-commit hook for this sort of >>> thing too, which'll let us enforce it in CI! >>> >>> On Mon, Sep 12, 2022, at 10:18, Matthew Topol wrote: >>> > Automated semver would be ideal if we can do it..... >>> > >>> > There's quite a lot of utilities that exist which would automatically >>> > handle the versioning if we're using conventional commits. >>> > >>> > On Mon, Sep 12 2022 at 02:26:15 PM +0200, Jacob Wujciak >>> > <ja...@voltrondata.com.INVALID> wrote: >>> >> + 1 to independent, semver versioning for adbc. >>> >> I would propose we use conventional commit style [1] commit messages >>> >> for >>> >> the pr commits (I assume squash + merge) so we can automate the >>> >> versioning|double check manual versioning. >>> >> >>> >> [1]: <https://www.conventionalcommits.org/> >>> >> >>> >> On Thu, Sep 8, 2022 at 6:05 PM David Li <lidav...@apache.org >>> >> <mailto:lidav...@apache.org>> wrote: >>> >> >>> >>> Thanks all, I've updated the header with the proposed versioning >>> >>> scheme. >>> >>> >>> >>> At this point I believe the core definitions are ready. (Note that >>> >>> I'm >>> >>> explicitly punting on [1][2][3] here.) Absent further comments, I'd >>> >>> like to >>> >>> do the following: >>> >>> >>> >>> - Start a vote on mirroring adbc.h to arrow/format, as well adding >>> >>> docs/source/format/ADBC.rst that describes the header, the Java >>> >>> interface, >>> >>> the Go interface, and the versioning scheme (I will put up a PR >>> >>> beforehand) >>> >>> - Begin work on CI/packaging, with a release hopefully coinciding >>> >>> with >>> >>> Arrow 10.0.0 >>> >>> - Begin work on changes to the main repository, also hopefully in >>> >>> time for >>> >>> 10.0.0 (moving the Flight SQL driver to be part of apache/arrow; >>> >>> exposing >>> >>> it in PyArrow; possibly also exposing Acero via ADBC) >>> >>> >>> >>> [1]: <https://github.com/apache/arrow-adbc/issues/46> >>> >>> [2]: <https://github.com/apache/arrow-adbc/issues/55> >>> >>> [3]: <https://github.com/apache/arrow-adbc/issues/59> >>> >>> >>> >>> On Sat, Sep 3, 2022, at 18:36, Matthew Topol wrote: >>> >>> > +1 from me on the strategy proposed by Kou. >>> >>> > >>> >>> > That would be my preference also. I agree it is preferable to be >>> >>> versioned >>> >>> > independently. >>> >>> > >>> >>> > --Matt >>> >>> > >>> >>> > On Sat, Sep 3, 2022, 6:24 PM Sutou Kouhei <k...@clear-code.com >>> >>> <mailto:k...@clear-code.com>> wrote: >>> >>> > >>> >>> >> Hi, >>> >>> >> >>> >>> >> > Do we have a preference for versioning strategy? Should we >>> >>> >> > proceed in lockstep with the Arrow C++ library et. al. and >>> >>> >> > release "ADBC 1.0.0" (the API standard) with "drivers >>> >>> >> > version 10.0.0", or use an independent versioning scheme? >>> >>> >> > (For example, release API standard and components at >>> >>> >> > "1.0.0". Then further releases of components that do not >>> >>> >> > change the spec would be "1.1", "1.2", ...; if/when we >>> >>> >> > change the spec, start over with "2.0", "2.1", ...) >>> >>> >> >>> >>> >> I like an independent versioning schema. I assume that ADBC >>> >>> >> doesn't need backward incompatible changes frequently. How >>> >>> >> about incrementing major version only when ADBC needs >>> >>> >> any backward incompatible changes? >>> >>> >> >>> >>> >> e.g.: >>> >>> >> >>> >>> >> 1. Release ADBC (the API standard) 1.0.0 >>> >>> >> 2. Release adbc_driver_manager 1.0.0 >>> >>> >> 3. Release adbc_driver_postgres 1.0.0 >>> >>> >> 4. Add a new feature to adbc_driver_postgres without >>> >>> >> any backward incompatible changes >>> >>> >> 5. Release adbc_driver_postgres 1.1.0 >>> >>> >> 6. Fix a bug in adbc_driver_manager without >>> >>> >> any backward incompatible changes >>> >>> >> 7. Release adbc_driver_manager 1.0.1 >>> >>> >> 8. Add a backward incompatible change to adbc_driver_manager >>> >>> >> 9. Release adbc_driver_manager 2.0.0 >>> >>> >> 10. Add a new feature to ADBC without any >>> >>> >> backward incompatible changes >>> >>> >> 11. Release ADBC (the API standard) 1.1.0 >>> >>> >> >>> >>> >> >>> >>> >> Thanks, >>> >>> >> -- >>> >>> >> kou >>> >>> >> >>> >>> >> In <7b20d730-b85e-4818-b99e-3335c40c2...@www.fastmail.com >>> >>> <mailto:7b20d730-b85e-4818-b99e-3335c40c2...@www.fastmail.com>> >>> >>> >> "Re: [DISC] Improving Arrow's database support" on Thu, 01 Sep >>> >>> 2022 >>> >>> >> 16:36:43 -0400, >>> >>> >> "David Li" <lidav...@apache.org <mailto:lidav...@apache.org>> >>> >>> wrote: >>> >>> >> >>> >>> >> > Following up here with some specific questions: >>> >>> >> > >>> >>> >> > Matt Topol added some Go definitions [1] (thanks!) I'd assume >>> >>> we want >>> >>> to >>> >>> >> vote on those as well? >>> >>> >> > >>> >>> >> > How should the process work for Java/Go? For C/C++, I assume >>> >>> we'd >>> >>> treat >>> >>> >> it like the C Data Interface and copy adbc.h to format/ after a >>> >>> vote, >>> >>> and >>> >>> >> then vote on releases of components. Or do we really only >>> >>> consider the C >>> >>> >> header as the 'format', with the others being language-specific >>> >>> affordances? >>> >>> >> > >>> >>> >> > What about for Java and for Go? We could vote on and tag a >>> >>> release for >>> >>> >> Go, and add a documentation page that links to the Java/Go >>> >>> definitions >>> >>> at a >>> >>> >> specific revision (as the equivalent 'format' definition for >>> >>> Java/Go)? >>> >>> Or >>> >>> >> would we vendor the entire Java module/Go package as the >>> >>> 'format'? >>> >>> >> > >>> >>> >> > Do we have a preference for versioning strategy? Should we >>> >>> proceed in >>> >>> >> lockstep with the Arrow C++ library et. al. and release "ADBC >>> >>> 1.0.0" >>> >>> (the >>> >>> >> API standard) with "drivers version 10.0.0", or use an >>> >>> independent >>> >>> >> versioning scheme? (For example, release API standard and >>> >>> components at >>> >>> >> "1.0.0". Then further releases of components that do not change >>> >>> the spec >>> >>> >> would be "1.1", "1.2", ...; if/when we change the spec, start >>> >>> over with >>> >>> >> "2.0", "2.1", ...) >>> >>> >> > >>> >>> >> > [1]: >>> >>> <https://github.com/apache/arrow-adbc/blob/main/go/adbc/adbc.go> >>> >>> >> > >>> >>> >> > -David >>> >>> >> > >>> >>> >> > On Sun, Aug 28, 2022, at 10:56, Sutou Kouhei wrote: >>> >>> >> >> Hi, >>> >>> >> >> >>> >>> >> >> OK. I'll send pull requests for GLib and Ruby soon. >>> >>> >> >> >>> >>> >> >>> I'm curious if you have a particular use case in mind. >>> >>> >> >> >>> >>> >> >> I don't have any production-ready use case yet but I want to >>> >>> >> >> implement an Active Record adapter for ADBC. Active Record >>> >>> >> >> is the O/R mapper for Ruby on Rails. Implementing Web >>> >>> >> >> application by Ruby on Rails is one of major Ruby use >>> >>> >> >> cases. So providing Active Record interface for ADBC will >>> >>> >> >> increase Apache Arrow users in Ruby community. >>> >>> >> >> >>> >>> >> >> NOTE: Generally, Ruby on Rails users don't process large >>> >>> >> >> data but they sometimes need to process large (medium?) data >>> >>> >> >> in a batch process. Active Record adapter for ADBC may be >>> >>> >> >> useful for such use case. >>> >>> >> >> >>> >>> >> >>> There's a little bit more API cleanup to do [1]. If you >>> >>> >> >>> have comments on that or anything else, I'd appreciate >>> >>> >> >>> them. Otherwise, pull requests would also be appreciated. >>> >>> >> >> >>> >>> >> >> OK. I'll open issues/pull requests when I find >>> >>> >> >> something. For now, I think that "MODULE" type library >>> >>> >> >> instead of "SHARED" type library in CMake terminology >>> >>> >> >> [cmake] is better for driver modules. (I'll open an issue >>> >>> >> >> for this later.) >>> >>> >> >> >>> >>> >> >> [cmake]: >>> >>> <https://cmake.org/cmake/help/latest/command/add_library.html> >>> >>> >> >> >>> >>> >> >> >>> >>> >> >> Thanks, >>> >>> >> >> -- >>> >>> >> >> kou >>> >>> >> >> >>> >>> >> >> In <e6380315-94aa-4dd1-8685-268edd597...@www.fastmail.com >>> >>> <mailto:e6380315-94aa-4dd1-8685-268edd597...@www.fastmail.com>> >>> >>> >> >> "Re: [DISC] Improving Arrow's database support" on Sat, 27 >>> >>> Aug 2022 >>> >>> >> >> 15:28:56 -0400, >>> >>> >> >> "David Li" <lidav...@apache.org >>> >>> <mailto:lidav...@apache.org>> wrote: >>> >>> >> >> >>> >>> >> >>> I would be very happy to see GLib/Ruby bindings! I'm curious >>> >>> if you >>> >>> >> have a particular use case in mind. >>> >>> >> >>> >>> >>> >> >>> There's a little bit more API cleanup to do [1]. If you have >>> >>> comments >>> >>> >> on that or anything else, I'd appreciate them. Otherwise, pull >>> >>> requests >>> >>> >> would also be appreciated. >>> >>> >> >>> >>> >>> >> >>> [1]: <https://github.com/apache/arrow-adbc/issues/79> >>> >>> >> >>> >>> >>> >> >>> On Fri, Aug 26, 2022, at 21:53, Sutou Kouhei wrote: >>> >>> >> >>>> Hi, >>> >>> >> >>>> >>> >>> >> >>>> Thanks for sharing the current status! >>> >>> >> >>>> I understand. >>> >>> >> >>>> >>> >>> >> >>>> BTW, can I add GLib/Ruby bindings to apache/arrow-adbc >>> >>> >> >>>> before we release the first version? (I want to use ADBC >>> >>> >> >>>> from Ruby.) Or should I wait for the first release? If I can >>> >>> >> >>>> work on it now, I'll open pull requests for it. >>> >>> >> >>>> >>> >>> >> >>>> Thanks, >>> >>> >> >>>> -- >>> >>> >> >>>> kou >>> >>> >> >>>> >>> >>> >> >>>> In <8703efd9-51bd-4f91-b550-73830667d...@www.fastmail.com >>> >>> <mailto:8703efd9-51bd-4f91-b550-73830667d...@www.fastmail.com>> >>> >>> >> >>>> "Re: [DISC] Improving Arrow's database support" on Fri, >>> >>> 26 Aug >>> >>> 2022 >>> >>> >> >>>> 11:03:26 -0400, >>> >>> >> >>>> "David Li" <lidav...@apache.org >>> >>> <mailto:lidav...@apache.org>> wrote: >>> >>> >> >>>> >>> >>> >> >>>>> Thank you Kou! >>> >>> >> >>>>> >>> >>> >> >>>>> At least initially, I don't think I'll be able to complete >>> >>> the >>> >>> >> Dataset integration in time. So 10.0.0 probably won't ship with >>> >>> a hard >>> >>> >> dependency. That said I am hoping to have PyArrow take an >>> >>> optional >>> >>> >> dependency (so Flight SQL can finally be available from Python). >>> >>> >> >>>>> >>> >>> >> >>>>> On Fri, Aug 26, 2022, at 01:01, Sutou Kouhei wrote: >>> >>> >> >>>>>> Hi, >>> >>> >> >>>>>> >>> >>> >> >>>>>> As a maintainer of Linux packages, I want >>> >>> apache/arrow-adbc >>> >>> >> >>>>>> to be released before apache/arrow is released so that >>> >>> >> >>>>>> apache/arrow's .deb/.rpm can depend on apache/arrow-adbc's >>> >>> >> >>>>>> .deb/.rpm. >>> >>> >> >>>>>> >>> >>> >> >>>>>> (If Apache Arrow Dataset uses apache/arrow-adbc, >>> >>> >> >>>>>> apache/arrow's .deb/.rpm needs to depend on >>> >>> >> >>>>>> apache/arrow-adbc's .deb/.rpm.) >>> >>> >> >>>>>> >>> >>> >> >>>>>> We can add .deb/.rpm related files >>> >>> >> >>>>>> (dev/tasks/linux-packages/ in apache/arrow) to >>> >>> >> >>>>>> apache/arrow-adbc to build .deb/.rpm for >>> >>> apache/arrow-adbc. >>> >>> >> >>>>>> >>> >>> >> >>>>>> FYI: I did it for datafusion-contrib/datafusion-c: >>> >>> >> >>>>>> >>> >>> >> >>>>>> * >>> >>> >> >>> >>> <https://github.com/datafusion-contrib/datafusion-c/tree/main/package> >>> >>> >> >>>>>> * >>> >>> >> >>>>>> >>> >>> >> >>> >>> >>> >>> < >>> https://github.com/datafusion-contrib/datafusion-c/blob/main/.github/workflows/package.yaml >>> > >>> >>> >> >>>>>> >>> >>> >> >>>>>> I can work on it in apache/arrow-adbc. >>> >>> >> >>>>>> >>> >>> >> >>>>>> >>> >>> >> >>>>>> Thanks, >>> >>> >> >>>>>> -- >>> >>> >> >>>>>> kou >>> >>> >> >>>>>> >>> >>> >> >>>>>> In <5cbf2923-4fb4-4c5e-b11d-007209fdd...@www.fastmail.com >>> >>> <mailto:5cbf2923-4fb4-4c5e-b11d-007209fdd...@www.fastmail.com>> >>> >>> >> >>>>>> "Re: [DISC] Improving Arrow's database support" on Thu, >>> >>> 25 Aug >>> >>> >> 2022 >>> >>> >> >>>>>> 11:51:08 -0400, >>> >>> >> >>>>>> "David Li" <lidav...@apache.org >>> >>> <mailto:lidav...@apache.org>> wrote: >>> >>> >> >>>>>> >>> >>> >> >>>>>>> Fair enough, thank you. I'll try to expand a bit. (Sorry >>> >>> for the >>> >>> >> wall of text that follows…) >>> >>> >> >>>>>>> >>> >>> >> >>>>>>> These are the components: >>> >>> >> >>>>>>> >>> >>> >> >>>>>>> - Core adbc.h header >>> >>> >> >>>>>>> - Driver manager for C/C++ >>> >>> >> >>>>>>> - Flight SQL-based driver >>> >>> >> >>>>>>> - Postgres-based driver (WIP) >>> >>> >> >>>>>>> - SQLite-based driver (more of a testbed for me than an >>> >>> actual >>> >>> >> component - I don't think we'd actually distribute this) >>> >>> >> >>>>>>> - Java core interfaces >>> >>> >> >>>>>>> - Java driver manager >>> >>> >> >>>>>>> - Java JDBC-based driver >>> >>> >> >>>>>>> - Java Flight SQL-based driver >>> >>> >> >>>>>>> - Python driver manager >>> >>> >> >>>>>>> >>> >>> >> >>>>>>> I think: adbc.h gets mirrored into the Arrow repo. The >>> >>> Flight >>> >>> SQL >>> >>> >> drivers get moved to the main Arrow repo and distributed as part >>> >>> of the >>> >>> >> regular Arrow releases. >>> >>> >> >>>>>>> >>> >>> >> >>>>>>> For the rest of the components: they could be packaged >>> >>> >> individually, but versioned and released together. Also, each >>> >>> C/C++ >>> >>> driver >>> >>> >> probably needs a corresponding Python package so Python users do >>> >>> not >>> >>> have >>> >>> >> to futz with shared library configurations. (See [1].) So for >>> >>> instance, >>> >>> >> installing PyArrow would also give you the Flight SQL driver, >>> >>> and `pip >>> >>> >> install adbc_postgres` would get you the Postgres-based driver. >>> >>> >> >>>>>>> >>> >>> >> >>>>>>> That would mean setting up separate CI, release, etc. >>> >>> (and >>> >>> >> eventually linking Crossbow & Conbench as well?). That does mean >>> >>> >> duplication of effort, but the trade off is avoiding bloating >>> >>> the main >>> >>> >> release process even further. However, I'd like to hear from >>> >>> those >>> >>> closer >>> >>> >> to the release process on this subject - if it would make >>> >>> people's lives >>> >>> >> easier, we could merge everything into one repo/process. >>> >>> >> >>>>>>> >>> >>> >> >>>>>>> Integrations would be distributed as part of their >>> >>> respective >>> >>> >> packages (e.g. Arrow Dataset would optionally link to the driver >>> >>> manager). >>> >>> >> So the "part of Arrow 10.0.0" aspect means having a stable >>> >>> interface for >>> >>> >> adbc.h, and getting the Flight SQL drivers into the main repo. >>> >>> >> >>>>>>> >>> >>> >> >>>>>>> [1]: <https://github.com/apache/arrow-adbc/issues/53> >>> >>> >> >>>>>>> >>> >>> >> >>>>>>> On Thu, Aug 25, 2022, at 11:34, Antoine Pitrou wrote: >>> >>> >> >>>>>>>> On Fri, 19 Aug 2022 14:09:44 -0400 >>> >>> >> >>>>>>>> "David Li" <lidav...@apache.org >>> >>> <mailto:lidav...@apache.org>> wrote: >>> >>> >> >>>>>>>>> Since it's been a while, I'd like to give an update. >>> >>> There are >>> >>> >> also a few questions I have around distribution. >>> >>> >> >>>>>>>>> >>> >>> >> >>>>>>>>> Currently: >>> >>> >> >>>>>>>>> - Supported in C, Java, and Python. >>> >>> >> >>>>>>>>> - For C/Python, there are basic drivers wrapping >>> >>> Flight SQL >>> >>> and >>> >>> >> SQLite, with a draft of a libpq (Postgres) driver (using >>> >>> nanoarrow). >>> >>> >> >>>>>>>>> - For Java, there are drivers wrapping JDBC and Flight >>> >>> SQL. >>> >>> >> >>>>>>>>> - For Python, there's low-level bindings to the C API, >>> >>> and the >>> >>> >> DBAPI interface on top of that (+a few extension methods >>> >>> resembling >>> >>> >> DuckDB/Turbodbc). >>> >>> >> >>>>>>>>> >>> >>> >> >>>>>>>>> There's drafts of integration with Ibis [1], DBI (R), >>> >>> and >>> >>> >> DuckDB. (I'd like to thank Hannes and Kirill for their comments, >>> >>> as >>> >>> well as >>> >>> >> Antoine, Dewey, and Matt here.) >>> >>> >> >>>>>>>>> >>> >>> >> >>>>>>>>> I'd like to have this as part of 10.0.0 in some >>> >>> fashion. >>> >>> >> However, I'm not sure how we would like to handle packaging and >>> >>> >> distribution. In particular, there are several sub-components >>> >>> for each >>> >>> >> language (the driver manager + the drivers), increasing the >>> >>> work. Any >>> >>> >> thoughts here? >>> >>> >> >>>>>>>> >>> >>> >> >>>>>>>> Sorry, forgot to answer here. But I think your question >>> >>> is too >>> >>> >> broadly >>> >>> >> >>>>>>>> formulated. It probably deserves a case-by-case >>> >>> discussion, >>> >>> IMHO. >>> >>> >> >>>>>>>> >>> >>> >> >>>>>>>>> I'm also wondering how we want to handle this in terms >>> >>> of >>> >>> >> specification - I assume we'd consider the core header file/Java >>> >>> interfaces >>> >>> >> a spec like the C Data Interface/Flight RPC, and vote on >>> >>> them/mirror >>> >>> them >>> >>> >> into the format/ directory? >>> >>> >> >>>>>>>> >>> >>> >> >>>>>>>> That sounds like the right way to me indeed. >>> >>> >> >>>>>>>> >>> >>> >> >>>>>>>> Regards >>> >>> >> >>>>>>>> >>> >>> >> >>>>>>>> Antoine. >>> >>> >> >>> >>> >>>