+ 1 to independent, semver versioning for adbc. I would propose we use conventional commit style [1] commit messages for the pr commits (I assume squash + merge) so we can automate the versioning|double check manual versioning.
[1]: https://www.conventionalcommits.org/ On Thu, Sep 8, 2022 at 6:05 PM David Li <lidav...@apache.org> wrote: > Thanks all, I've updated the header with the proposed versioning scheme. > > At this point I believe the core definitions are ready. (Note that I'm > explicitly punting on [1][2][3] here.) Absent further comments, I'd like to > do the following: > > - Start a vote on mirroring adbc.h to arrow/format, as well adding > docs/source/format/ADBC.rst that describes the header, the Java interface, > the Go interface, and the versioning scheme (I will put up a PR beforehand) > - Begin work on CI/packaging, with a release hopefully coinciding with > Arrow 10.0.0 > - Begin work on changes to the main repository, also hopefully in time for > 10.0.0 (moving the Flight SQL driver to be part of apache/arrow; exposing > it in PyArrow; possibly also exposing Acero via ADBC) > > [1]: https://github.com/apache/arrow-adbc/issues/46 > [2]: https://github.com/apache/arrow-adbc/issues/55 > [3]: https://github.com/apache/arrow-adbc/issues/59 > > On Sat, Sep 3, 2022, at 18:36, Matthew Topol wrote: > > +1 from me on the strategy proposed by Kou. > > > > That would be my preference also. I agree it is preferable to be > versioned > > independently. > > > > --Matt > > > > On Sat, Sep 3, 2022, 6:24 PM Sutou Kouhei <k...@clear-code.com> wrote: > > > >> Hi, > >> > >> > Do we have a preference for versioning strategy? Should we > >> > proceed in lockstep with the Arrow C++ library et. al. and > >> > release "ADBC 1.0.0" (the API standard) with "drivers > >> > version 10.0.0", or use an independent versioning scheme? > >> > (For example, release API standard and components at > >> > "1.0.0". Then further releases of components that do not > >> > change the spec would be "1.1", "1.2", ...; if/when we > >> > change the spec, start over with "2.0", "2.1", ...) > >> > >> I like an independent versioning schema. I assume that ADBC > >> doesn't need backward incompatible changes frequently. How > >> about incrementing major version only when ADBC needs > >> any backward incompatible changes? > >> > >> e.g.: > >> > >> 1. Release ADBC (the API standard) 1.0.0 > >> 2. Release adbc_driver_manager 1.0.0 > >> 3. Release adbc_driver_postgres 1.0.0 > >> 4. Add a new feature to adbc_driver_postgres without > >> any backward incompatible changes > >> 5. Release adbc_driver_postgres 1.1.0 > >> 6. Fix a bug in adbc_driver_manager without > >> any backward incompatible changes > >> 7. Release adbc_driver_manager 1.0.1 > >> 8. Add a backward incompatible change to adbc_driver_manager > >> 9. Release adbc_driver_manager 2.0.0 > >> 10. Add a new feature to ADBC without any > >> backward incompatible changes > >> 11. Release ADBC (the API standard) 1.1.0 > >> > >> > >> Thanks, > >> -- > >> kou > >> > >> In <7b20d730-b85e-4818-b99e-3335c40c2...@www.fastmail.com> > >> "Re: [DISC] Improving Arrow's database support" on Thu, 01 Sep 2022 > >> 16:36:43 -0400, > >> "David Li" <lidav...@apache.org> wrote: > >> > >> > Following up here with some specific questions: > >> > > >> > Matt Topol added some Go definitions [1] (thanks!) I'd assume we want > to > >> vote on those as well? > >> > > >> > How should the process work for Java/Go? For C/C++, I assume we'd > treat > >> it like the C Data Interface and copy adbc.h to format/ after a vote, > and > >> then vote on releases of components. Or do we really only consider the C > >> header as the 'format', with the others being language-specific > affordances? > >> > > >> > What about for Java and for Go? We could vote on and tag a release for > >> Go, and add a documentation page that links to the Java/Go definitions > at a > >> specific revision (as the equivalent 'format' definition for Java/Go)? > Or > >> would we vendor the entire Java module/Go package as the 'format'? > >> > > >> > Do we have a preference for versioning strategy? Should we proceed in > >> lockstep with the Arrow C++ library et. al. and release "ADBC 1.0.0" > (the > >> API standard) with "drivers version 10.0.0", or use an independent > >> versioning scheme? (For example, release API standard and components at > >> "1.0.0". Then further releases of components that do not change the spec > >> would be "1.1", "1.2", ...; if/when we change the spec, start over with > >> "2.0", "2.1", ...) > >> > > >> > [1]: https://github.com/apache/arrow-adbc/blob/main/go/adbc/adbc.go > >> > > >> > -David > >> > > >> > On Sun, Aug 28, 2022, at 10:56, Sutou Kouhei wrote: > >> >> Hi, > >> >> > >> >> OK. I'll send pull requests for GLib and Ruby soon. > >> >> > >> >>> I'm curious if you have a particular use case in mind. > >> >> > >> >> I don't have any production-ready use case yet but I want to > >> >> implement an Active Record adapter for ADBC. Active Record > >> >> is the O/R mapper for Ruby on Rails. Implementing Web > >> >> application by Ruby on Rails is one of major Ruby use > >> >> cases. So providing Active Record interface for ADBC will > >> >> increase Apache Arrow users in Ruby community. > >> >> > >> >> NOTE: Generally, Ruby on Rails users don't process large > >> >> data but they sometimes need to process large (medium?) data > >> >> in a batch process. Active Record adapter for ADBC may be > >> >> useful for such use case. > >> >> > >> >>> There's a little bit more API cleanup to do [1]. If you > >> >>> have comments on that or anything else, I'd appreciate > >> >>> them. Otherwise, pull requests would also be appreciated. > >> >> > >> >> OK. I'll open issues/pull requests when I find > >> >> something. For now, I think that "MODULE" type library > >> >> instead of "SHARED" type library in CMake terminology > >> >> [cmake] is better for driver modules. (I'll open an issue > >> >> for this later.) > >> >> > >> >> [cmake]: > https://cmake.org/cmake/help/latest/command/add_library.html > >> >> > >> >> > >> >> Thanks, > >> >> -- > >> >> kou > >> >> > >> >> In <e6380315-94aa-4dd1-8685-268edd597...@www.fastmail.com> > >> >> "Re: [DISC] Improving Arrow's database support" on Sat, 27 Aug 2022 > >> >> 15:28:56 -0400, > >> >> "David Li" <lidav...@apache.org> wrote: > >> >> > >> >>> I would be very happy to see GLib/Ruby bindings! I'm curious if you > >> have a particular use case in mind. > >> >>> > >> >>> There's a little bit more API cleanup to do [1]. If you have > comments > >> on that or anything else, I'd appreciate them. Otherwise, pull requests > >> would also be appreciated. > >> >>> > >> >>> [1]: https://github.com/apache/arrow-adbc/issues/79 > >> >>> > >> >>> On Fri, Aug 26, 2022, at 21:53, Sutou Kouhei wrote: > >> >>>> Hi, > >> >>>> > >> >>>> Thanks for sharing the current status! > >> >>>> I understand. > >> >>>> > >> >>>> BTW, can I add GLib/Ruby bindings to apache/arrow-adbc > >> >>>> before we release the first version? (I want to use ADBC > >> >>>> from Ruby.) Or should I wait for the first release? If I can > >> >>>> work on it now, I'll open pull requests for it. > >> >>>> > >> >>>> Thanks, > >> >>>> -- > >> >>>> kou > >> >>>> > >> >>>> In <8703efd9-51bd-4f91-b550-73830667d...@www.fastmail.com> > >> >>>> "Re: [DISC] Improving Arrow's database support" on Fri, 26 Aug > 2022 > >> >>>> 11:03:26 -0400, > >> >>>> "David Li" <lidav...@apache.org> wrote: > >> >>>> > >> >>>>> Thank you Kou! > >> >>>>> > >> >>>>> At least initially, I don't think I'll be able to complete the > >> Dataset integration in time. So 10.0.0 probably won't ship with a hard > >> dependency. That said I am hoping to have PyArrow take an optional > >> dependency (so Flight SQL can finally be available from Python). > >> >>>>> > >> >>>>> On Fri, Aug 26, 2022, at 01:01, Sutou Kouhei wrote: > >> >>>>>> Hi, > >> >>>>>> > >> >>>>>> As a maintainer of Linux packages, I want apache/arrow-adbc > >> >>>>>> to be released before apache/arrow is released so that > >> >>>>>> apache/arrow's .deb/.rpm can depend on apache/arrow-adbc's > >> >>>>>> .deb/.rpm. > >> >>>>>> > >> >>>>>> (If Apache Arrow Dataset uses apache/arrow-adbc, > >> >>>>>> apache/arrow's .deb/.rpm needs to depend on > >> >>>>>> apache/arrow-adbc's .deb/.rpm.) > >> >>>>>> > >> >>>>>> We can add .deb/.rpm related files > >> >>>>>> (dev/tasks/linux-packages/ in apache/arrow) to > >> >>>>>> apache/arrow-adbc to build .deb/.rpm for apache/arrow-adbc. > >> >>>>>> > >> >>>>>> FYI: I did it for datafusion-contrib/datafusion-c: > >> >>>>>> > >> >>>>>> * > >> https://github.com/datafusion-contrib/datafusion-c/tree/main/package > >> >>>>>> * > >> >>>>>> > >> > https://github.com/datafusion-contrib/datafusion-c/blob/main/.github/workflows/package.yaml > >> >>>>>> > >> >>>>>> I can work on it in apache/arrow-adbc. > >> >>>>>> > >> >>>>>> > >> >>>>>> Thanks, > >> >>>>>> -- > >> >>>>>> kou > >> >>>>>> > >> >>>>>> In <5cbf2923-4fb4-4c5e-b11d-007209fdd...@www.fastmail.com> > >> >>>>>> "Re: [DISC] Improving Arrow's database support" on Thu, 25 Aug > >> 2022 > >> >>>>>> 11:51:08 -0400, > >> >>>>>> "David Li" <lidav...@apache.org> wrote: > >> >>>>>> > >> >>>>>>> Fair enough, thank you. I'll try to expand a bit. (Sorry for the > >> wall of text that follows…) > >> >>>>>>> > >> >>>>>>> These are the components: > >> >>>>>>> > >> >>>>>>> - Core adbc.h header > >> >>>>>>> - Driver manager for C/C++ > >> >>>>>>> - Flight SQL-based driver > >> >>>>>>> - Postgres-based driver (WIP) > >> >>>>>>> - SQLite-based driver (more of a testbed for me than an actual > >> component - I don't think we'd actually distribute this) > >> >>>>>>> - Java core interfaces > >> >>>>>>> - Java driver manager > >> >>>>>>> - Java JDBC-based driver > >> >>>>>>> - Java Flight SQL-based driver > >> >>>>>>> - Python driver manager > >> >>>>>>> > >> >>>>>>> I think: adbc.h gets mirrored into the Arrow repo. The Flight > SQL > >> drivers get moved to the main Arrow repo and distributed as part of the > >> regular Arrow releases. > >> >>>>>>> > >> >>>>>>> For the rest of the components: they could be packaged > >> individually, but versioned and released together. Also, each C/C++ > driver > >> probably needs a corresponding Python package so Python users do not > have > >> to futz with shared library configurations. (See [1].) So for instance, > >> installing PyArrow would also give you the Flight SQL driver, and `pip > >> install adbc_postgres` would get you the Postgres-based driver. > >> >>>>>>> > >> >>>>>>> That would mean setting up separate CI, release, etc. (and > >> eventually linking Crossbow & Conbench as well?). That does mean > >> duplication of effort, but the trade off is avoiding bloating the main > >> release process even further. However, I'd like to hear from those > closer > >> to the release process on this subject - if it would make people's lives > >> easier, we could merge everything into one repo/process. > >> >>>>>>> > >> >>>>>>> Integrations would be distributed as part of their respective > >> packages (e.g. Arrow Dataset would optionally link to the driver > manager). > >> So the "part of Arrow 10.0.0" aspect means having a stable interface for > >> adbc.h, and getting the Flight SQL drivers into the main repo. > >> >>>>>>> > >> >>>>>>> [1]: https://github.com/apache/arrow-adbc/issues/53 > >> >>>>>>> > >> >>>>>>> On Thu, Aug 25, 2022, at 11:34, Antoine Pitrou wrote: > >> >>>>>>>> On Fri, 19 Aug 2022 14:09:44 -0400 > >> >>>>>>>> "David Li" <lidav...@apache.org> wrote: > >> >>>>>>>>> Since it's been a while, I'd like to give an update. There are > >> also a few questions I have around distribution. > >> >>>>>>>>> > >> >>>>>>>>> Currently: > >> >>>>>>>>> - Supported in C, Java, and Python. > >> >>>>>>>>> - For C/Python, there are basic drivers wrapping Flight SQL > and > >> SQLite, with a draft of a libpq (Postgres) driver (using nanoarrow). > >> >>>>>>>>> - For Java, there are drivers wrapping JDBC and Flight SQL. > >> >>>>>>>>> - For Python, there's low-level bindings to the C API, and the > >> DBAPI interface on top of that (+a few extension methods resembling > >> DuckDB/Turbodbc). > >> >>>>>>>>> > >> >>>>>>>>> There's drafts of integration with Ibis [1], DBI (R), and > >> DuckDB. (I'd like to thank Hannes and Kirill for their comments, as > well as > >> Antoine, Dewey, and Matt here.) > >> >>>>>>>>> > >> >>>>>>>>> I'd like to have this as part of 10.0.0 in some fashion. > >> However, I'm not sure how we would like to handle packaging and > >> distribution. In particular, there are several sub-components for each > >> language (the driver manager + the drivers), increasing the work. Any > >> thoughts here? > >> >>>>>>>> > >> >>>>>>>> Sorry, forgot to answer here. But I think your question is too > >> broadly > >> >>>>>>>> formulated. It probably deserves a case-by-case discussion, > IMHO. > >> >>>>>>>> > >> >>>>>>>>> I'm also wondering how we want to handle this in terms of > >> specification - I assume we'd consider the core header file/Java > interfaces > >> a spec like the C Data Interface/Flight RPC, and vote on them/mirror > them > >> into the format/ directory? > >> >>>>>>>> > >> >>>>>>>> That sounds like the right way to me indeed. > >> >>>>>>>> > >> >>>>>>>> Regards > >> >>>>>>>> > >> >>>>>>>> Antoine. > >> >