On Mon, Sep 12, 2022 at 12:44 PM David Li <lidav...@apache.org> wrote:
> I like this idea. I would also like to set up some sort of automated ABI > checker as well (the options I found were GPL/LGPL so I need to figure out > how to proceed). > You should be able to use GPL software in CI, that's no problem. You can even depend on GPL software as long as it is "optional": https://www.apache.org/legal/resolved.html#optional But this would not even count as that since the ABI checker wouldn't be required to use the software. Neal > > I can put up a PR later that formalizes these guidelines in > CONTRIBUTING.md. It looks like there's a pre-commit hook for this sort of > thing too, which'll let us enforce it in CI! > > On Mon, Sep 12, 2022, at 10:18, Matthew Topol wrote: > > Automated semver would be ideal if we can do it..... > > > > There's quite a lot of utilities that exist which would automatically > > handle the versioning if we're using conventional commits. > > > > On Mon, Sep 12 2022 at 02:26:15 PM +0200, Jacob Wujciak > > <ja...@voltrondata.com.INVALID> wrote: > >> + 1 to independent, semver versioning for adbc. > >> I would propose we use conventional commit style [1] commit messages > >> for > >> the pr commits (I assume squash + merge) so we can automate the > >> versioning|double check manual versioning. > >> > >> [1]: <https://www.conventionalcommits.org/> > >> > >> On Thu, Sep 8, 2022 at 6:05 PM David Li <lidav...@apache.org > >> <mailto:lidav...@apache.org>> wrote: > >> > >>> Thanks all, I've updated the header with the proposed versioning > >>> scheme. > >>> > >>> At this point I believe the core definitions are ready. (Note that > >>> I'm > >>> explicitly punting on [1][2][3] here.) Absent further comments, I'd > >>> like to > >>> do the following: > >>> > >>> - Start a vote on mirroring adbc.h to arrow/format, as well adding > >>> docs/source/format/ADBC.rst that describes the header, the Java > >>> interface, > >>> the Go interface, and the versioning scheme (I will put up a PR > >>> beforehand) > >>> - Begin work on CI/packaging, with a release hopefully coinciding > >>> with > >>> Arrow 10.0.0 > >>> - Begin work on changes to the main repository, also hopefully in > >>> time for > >>> 10.0.0 (moving the Flight SQL driver to be part of apache/arrow; > >>> exposing > >>> it in PyArrow; possibly also exposing Acero via ADBC) > >>> > >>> [1]: <https://github.com/apache/arrow-adbc/issues/46> > >>> [2]: <https://github.com/apache/arrow-adbc/issues/55> > >>> [3]: <https://github.com/apache/arrow-adbc/issues/59> > >>> > >>> On Sat, Sep 3, 2022, at 18:36, Matthew Topol wrote: > >>> > +1 from me on the strategy proposed by Kou. > >>> > > >>> > That would be my preference also. I agree it is preferable to be > >>> versioned > >>> > independently. > >>> > > >>> > --Matt > >>> > > >>> > On Sat, Sep 3, 2022, 6:24 PM Sutou Kouhei <k...@clear-code.com > >>> <mailto:k...@clear-code.com>> wrote: > >>> > > >>> >> Hi, > >>> >> > >>> >> > Do we have a preference for versioning strategy? Should we > >>> >> > proceed in lockstep with the Arrow C++ library et. al. and > >>> >> > release "ADBC 1.0.0" (the API standard) with "drivers > >>> >> > version 10.0.0", or use an independent versioning scheme? > >>> >> > (For example, release API standard and components at > >>> >> > "1.0.0". Then further releases of components that do not > >>> >> > change the spec would be "1.1", "1.2", ...; if/when we > >>> >> > change the spec, start over with "2.0", "2.1", ...) > >>> >> > >>> >> I like an independent versioning schema. I assume that ADBC > >>> >> doesn't need backward incompatible changes frequently. How > >>> >> about incrementing major version only when ADBC needs > >>> >> any backward incompatible changes? > >>> >> > >>> >> e.g.: > >>> >> > >>> >> 1. Release ADBC (the API standard) 1.0.0 > >>> >> 2. Release adbc_driver_manager 1.0.0 > >>> >> 3. Release adbc_driver_postgres 1.0.0 > >>> >> 4. Add a new feature to adbc_driver_postgres without > >>> >> any backward incompatible changes > >>> >> 5. Release adbc_driver_postgres 1.1.0 > >>> >> 6. Fix a bug in adbc_driver_manager without > >>> >> any backward incompatible changes > >>> >> 7. Release adbc_driver_manager 1.0.1 > >>> >> 8. Add a backward incompatible change to adbc_driver_manager > >>> >> 9. Release adbc_driver_manager 2.0.0 > >>> >> 10. Add a new feature to ADBC without any > >>> >> backward incompatible changes > >>> >> 11. Release ADBC (the API standard) 1.1.0 > >>> >> > >>> >> > >>> >> Thanks, > >>> >> -- > >>> >> kou > >>> >> > >>> >> In <7b20d730-b85e-4818-b99e-3335c40c2...@www.fastmail.com > >>> <mailto:7b20d730-b85e-4818-b99e-3335c40c2...@www.fastmail.com>> > >>> >> "Re: [DISC] Improving Arrow's database support" on Thu, 01 Sep > >>> 2022 > >>> >> 16:36:43 -0400, > >>> >> "David Li" <lidav...@apache.org <mailto:lidav...@apache.org>> > >>> wrote: > >>> >> > >>> >> > Following up here with some specific questions: > >>> >> > > >>> >> > Matt Topol added some Go definitions [1] (thanks!) I'd assume > >>> we want > >>> to > >>> >> vote on those as well? > >>> >> > > >>> >> > How should the process work for Java/Go? For C/C++, I assume > >>> we'd > >>> treat > >>> >> it like the C Data Interface and copy adbc.h to format/ after a > >>> vote, > >>> and > >>> >> then vote on releases of components. Or do we really only > >>> consider the C > >>> >> header as the 'format', with the others being language-specific > >>> affordances? > >>> >> > > >>> >> > What about for Java and for Go? We could vote on and tag a > >>> release for > >>> >> Go, and add a documentation page that links to the Java/Go > >>> definitions > >>> at a > >>> >> specific revision (as the equivalent 'format' definition for > >>> Java/Go)? > >>> Or > >>> >> would we vendor the entire Java module/Go package as the > >>> 'format'? > >>> >> > > >>> >> > Do we have a preference for versioning strategy? Should we > >>> proceed in > >>> >> lockstep with the Arrow C++ library et. al. and release "ADBC > >>> 1.0.0" > >>> (the > >>> >> API standard) with "drivers version 10.0.0", or use an > >>> independent > >>> >> versioning scheme? (For example, release API standard and > >>> components at > >>> >> "1.0.0". Then further releases of components that do not change > >>> the spec > >>> >> would be "1.1", "1.2", ...; if/when we change the spec, start > >>> over with > >>> >> "2.0", "2.1", ...) > >>> >> > > >>> >> > [1]: > >>> <https://github.com/apache/arrow-adbc/blob/main/go/adbc/adbc.go> > >>> >> > > >>> >> > -David > >>> >> > > >>> >> > On Sun, Aug 28, 2022, at 10:56, Sutou Kouhei wrote: > >>> >> >> Hi, > >>> >> >> > >>> >> >> OK. I'll send pull requests for GLib and Ruby soon. > >>> >> >> > >>> >> >>> I'm curious if you have a particular use case in mind. > >>> >> >> > >>> >> >> I don't have any production-ready use case yet but I want to > >>> >> >> implement an Active Record adapter for ADBC. Active Record > >>> >> >> is the O/R mapper for Ruby on Rails. Implementing Web > >>> >> >> application by Ruby on Rails is one of major Ruby use > >>> >> >> cases. So providing Active Record interface for ADBC will > >>> >> >> increase Apache Arrow users in Ruby community. > >>> >> >> > >>> >> >> NOTE: Generally, Ruby on Rails users don't process large > >>> >> >> data but they sometimes need to process large (medium?) data > >>> >> >> in a batch process. Active Record adapter for ADBC may be > >>> >> >> useful for such use case. > >>> >> >> > >>> >> >>> There's a little bit more API cleanup to do [1]. If you > >>> >> >>> have comments on that or anything else, I'd appreciate > >>> >> >>> them. Otherwise, pull requests would also be appreciated. > >>> >> >> > >>> >> >> OK. I'll open issues/pull requests when I find > >>> >> >> something. For now, I think that "MODULE" type library > >>> >> >> instead of "SHARED" type library in CMake terminology > >>> >> >> [cmake] is better for driver modules. (I'll open an issue > >>> >> >> for this later.) > >>> >> >> > >>> >> >> [cmake]: > >>> <https://cmake.org/cmake/help/latest/command/add_library.html> > >>> >> >> > >>> >> >> > >>> >> >> Thanks, > >>> >> >> -- > >>> >> >> kou > >>> >> >> > >>> >> >> In <e6380315-94aa-4dd1-8685-268edd597...@www.fastmail.com > >>> <mailto:e6380315-94aa-4dd1-8685-268edd597...@www.fastmail.com>> > >>> >> >> "Re: [DISC] Improving Arrow's database support" on Sat, 27 > >>> Aug 2022 > >>> >> >> 15:28:56 -0400, > >>> >> >> "David Li" <lidav...@apache.org > >>> <mailto:lidav...@apache.org>> wrote: > >>> >> >> > >>> >> >>> I would be very happy to see GLib/Ruby bindings! I'm curious > >>> if you > >>> >> have a particular use case in mind. > >>> >> >>> > >>> >> >>> There's a little bit more API cleanup to do [1]. If you have > >>> comments > >>> >> on that or anything else, I'd appreciate them. Otherwise, pull > >>> requests > >>> >> would also be appreciated. > >>> >> >>> > >>> >> >>> [1]: <https://github.com/apache/arrow-adbc/issues/79> > >>> >> >>> > >>> >> >>> On Fri, Aug 26, 2022, at 21:53, Sutou Kouhei wrote: > >>> >> >>>> Hi, > >>> >> >>>> > >>> >> >>>> Thanks for sharing the current status! > >>> >> >>>> I understand. > >>> >> >>>> > >>> >> >>>> BTW, can I add GLib/Ruby bindings to apache/arrow-adbc > >>> >> >>>> before we release the first version? (I want to use ADBC > >>> >> >>>> from Ruby.) Or should I wait for the first release? If I can > >>> >> >>>> work on it now, I'll open pull requests for it. > >>> >> >>>> > >>> >> >>>> Thanks, > >>> >> >>>> -- > >>> >> >>>> kou > >>> >> >>>> > >>> >> >>>> In <8703efd9-51bd-4f91-b550-73830667d...@www.fastmail.com > >>> <mailto:8703efd9-51bd-4f91-b550-73830667d...@www.fastmail.com>> > >>> >> >>>> "Re: [DISC] Improving Arrow's database support" on Fri, > >>> 26 Aug > >>> 2022 > >>> >> >>>> 11:03:26 -0400, > >>> >> >>>> "David Li" <lidav...@apache.org > >>> <mailto:lidav...@apache.org>> wrote: > >>> >> >>>> > >>> >> >>>>> Thank you Kou! > >>> >> >>>>> > >>> >> >>>>> At least initially, I don't think I'll be able to complete > >>> the > >>> >> Dataset integration in time. So 10.0.0 probably won't ship with > >>> a hard > >>> >> dependency. That said I am hoping to have PyArrow take an > >>> optional > >>> >> dependency (so Flight SQL can finally be available from Python). > >>> >> >>>>> > >>> >> >>>>> On Fri, Aug 26, 2022, at 01:01, Sutou Kouhei wrote: > >>> >> >>>>>> Hi, > >>> >> >>>>>> > >>> >> >>>>>> As a maintainer of Linux packages, I want > >>> apache/arrow-adbc > >>> >> >>>>>> to be released before apache/arrow is released so that > >>> >> >>>>>> apache/arrow's .deb/.rpm can depend on apache/arrow-adbc's > >>> >> >>>>>> .deb/.rpm. > >>> >> >>>>>> > >>> >> >>>>>> (If Apache Arrow Dataset uses apache/arrow-adbc, > >>> >> >>>>>> apache/arrow's .deb/.rpm needs to depend on > >>> >> >>>>>> apache/arrow-adbc's .deb/.rpm.) > >>> >> >>>>>> > >>> >> >>>>>> We can add .deb/.rpm related files > >>> >> >>>>>> (dev/tasks/linux-packages/ in apache/arrow) to > >>> >> >>>>>> apache/arrow-adbc to build .deb/.rpm for > >>> apache/arrow-adbc. > >>> >> >>>>>> > >>> >> >>>>>> FYI: I did it for datafusion-contrib/datafusion-c: > >>> >> >>>>>> > >>> >> >>>>>> * > >>> >> > >>> <https://github.com/datafusion-contrib/datafusion-c/tree/main/package> > >>> >> >>>>>> * > >>> >> >>>>>> > >>> >> > >>> > >>> < > https://github.com/datafusion-contrib/datafusion-c/blob/main/.github/workflows/package.yaml > > > >>> >> >>>>>> > >>> >> >>>>>> I can work on it in apache/arrow-adbc. > >>> >> >>>>>> > >>> >> >>>>>> > >>> >> >>>>>> Thanks, > >>> >> >>>>>> -- > >>> >> >>>>>> kou > >>> >> >>>>>> > >>> >> >>>>>> In <5cbf2923-4fb4-4c5e-b11d-007209fdd...@www.fastmail.com > >>> <mailto:5cbf2923-4fb4-4c5e-b11d-007209fdd...@www.fastmail.com>> > >>> >> >>>>>> "Re: [DISC] Improving Arrow's database support" on Thu, > >>> 25 Aug > >>> >> 2022 > >>> >> >>>>>> 11:51:08 -0400, > >>> >> >>>>>> "David Li" <lidav...@apache.org > >>> <mailto:lidav...@apache.org>> wrote: > >>> >> >>>>>> > >>> >> >>>>>>> Fair enough, thank you. I'll try to expand a bit. (Sorry > >>> for the > >>> >> wall of text that follows…) > >>> >> >>>>>>> > >>> >> >>>>>>> These are the components: > >>> >> >>>>>>> > >>> >> >>>>>>> - Core adbc.h header > >>> >> >>>>>>> - Driver manager for C/C++ > >>> >> >>>>>>> - Flight SQL-based driver > >>> >> >>>>>>> - Postgres-based driver (WIP) > >>> >> >>>>>>> - SQLite-based driver (more of a testbed for me than an > >>> actual > >>> >> component - I don't think we'd actually distribute this) > >>> >> >>>>>>> - Java core interfaces > >>> >> >>>>>>> - Java driver manager > >>> >> >>>>>>> - Java JDBC-based driver > >>> >> >>>>>>> - Java Flight SQL-based driver > >>> >> >>>>>>> - Python driver manager > >>> >> >>>>>>> > >>> >> >>>>>>> I think: adbc.h gets mirrored into the Arrow repo. The > >>> Flight > >>> SQL > >>> >> drivers get moved to the main Arrow repo and distributed as part > >>> of the > >>> >> regular Arrow releases. > >>> >> >>>>>>> > >>> >> >>>>>>> For the rest of the components: they could be packaged > >>> >> individually, but versioned and released together. Also, each > >>> C/C++ > >>> driver > >>> >> probably needs a corresponding Python package so Python users do > >>> not > >>> have > >>> >> to futz with shared library configurations. (See [1].) So for > >>> instance, > >>> >> installing PyArrow would also give you the Flight SQL driver, > >>> and `pip > >>> >> install adbc_postgres` would get you the Postgres-based driver. > >>> >> >>>>>>> > >>> >> >>>>>>> That would mean setting up separate CI, release, etc. > >>> (and > >>> >> eventually linking Crossbow & Conbench as well?). That does mean > >>> >> duplication of effort, but the trade off is avoiding bloating > >>> the main > >>> >> release process even further. However, I'd like to hear from > >>> those > >>> closer > >>> >> to the release process on this subject - if it would make > >>> people's lives > >>> >> easier, we could merge everything into one repo/process. > >>> >> >>>>>>> > >>> >> >>>>>>> Integrations would be distributed as part of their > >>> respective > >>> >> packages (e.g. Arrow Dataset would optionally link to the driver > >>> manager). > >>> >> So the "part of Arrow 10.0.0" aspect means having a stable > >>> interface for > >>> >> adbc.h, and getting the Flight SQL drivers into the main repo. > >>> >> >>>>>>> > >>> >> >>>>>>> [1]: <https://github.com/apache/arrow-adbc/issues/53> > >>> >> >>>>>>> > >>> >> >>>>>>> On Thu, Aug 25, 2022, at 11:34, Antoine Pitrou wrote: > >>> >> >>>>>>>> On Fri, 19 Aug 2022 14:09:44 -0400 > >>> >> >>>>>>>> "David Li" <lidav...@apache.org > >>> <mailto:lidav...@apache.org>> wrote: > >>> >> >>>>>>>>> Since it's been a while, I'd like to give an update. > >>> There are > >>> >> also a few questions I have around distribution. > >>> >> >>>>>>>>> > >>> >> >>>>>>>>> Currently: > >>> >> >>>>>>>>> - Supported in C, Java, and Python. > >>> >> >>>>>>>>> - For C/Python, there are basic drivers wrapping > >>> Flight SQL > >>> and > >>> >> SQLite, with a draft of a libpq (Postgres) driver (using > >>> nanoarrow). > >>> >> >>>>>>>>> - For Java, there are drivers wrapping JDBC and Flight > >>> SQL. > >>> >> >>>>>>>>> - For Python, there's low-level bindings to the C API, > >>> and the > >>> >> DBAPI interface on top of that (+a few extension methods > >>> resembling > >>> >> DuckDB/Turbodbc). > >>> >> >>>>>>>>> > >>> >> >>>>>>>>> There's drafts of integration with Ibis [1], DBI (R), > >>> and > >>> >> DuckDB. (I'd like to thank Hannes and Kirill for their comments, > >>> as > >>> well as > >>> >> Antoine, Dewey, and Matt here.) > >>> >> >>>>>>>>> > >>> >> >>>>>>>>> I'd like to have this as part of 10.0.0 in some > >>> fashion. > >>> >> However, I'm not sure how we would like to handle packaging and > >>> >> distribution. In particular, there are several sub-components > >>> for each > >>> >> language (the driver manager + the drivers), increasing the > >>> work. Any > >>> >> thoughts here? > >>> >> >>>>>>>> > >>> >> >>>>>>>> Sorry, forgot to answer here. But I think your question > >>> is too > >>> >> broadly > >>> >> >>>>>>>> formulated. It probably deserves a case-by-case > >>> discussion, > >>> IMHO. > >>> >> >>>>>>>> > >>> >> >>>>>>>>> I'm also wondering how we want to handle this in terms > >>> of > >>> >> specification - I assume we'd consider the core header file/Java > >>> interfaces > >>> >> a spec like the C Data Interface/Flight RPC, and vote on > >>> them/mirror > >>> them > >>> >> into the format/ directory? > >>> >> >>>>>>>> > >>> >> >>>>>>>> That sounds like the right way to me indeed. > >>> >> >>>>>>>> > >>> >> >>>>>>>> Regards > >>> >> >>>>>>>> > >>> >> >>>>>>>> Antoine. > >>> >> > >>> >