Hi,

Thanks for sharing the current status!
I understand.

BTW, can I add GLib/Ruby bindings to apache/arrow-adbc
before we release the first version? (I want to use ADBC
from Ruby.) Or should I wait for the first release? If I can
work on it now, I'll open pull requests for it.

Thanks,
-- 
kou

In <8703efd9-51bd-4f91-b550-73830667d...@www.fastmail.com>
  "Re: [DISC] Improving Arrow's database support" on Fri, 26 Aug 2022 11:03:26 
-0400,
  "David Li" <lidav...@apache.org> wrote:

> Thank you Kou!
> 
> At least initially, I don't think I'll be able to complete the Dataset 
> integration in time. So 10.0.0 probably won't ship with a hard dependency. 
> That said I am hoping to have PyArrow take an optional dependency (so Flight 
> SQL can finally be available from Python).
> 
> On Fri, Aug 26, 2022, at 01:01, Sutou Kouhei wrote:
>> Hi,
>>
>> As a maintainer of Linux packages, I want apache/arrow-adbc
>> to be released before apache/arrow is released so that
>> apache/arrow's .deb/.rpm can depend on apache/arrow-adbc's
>> .deb/.rpm.
>>
>> (If Apache Arrow Dataset uses apache/arrow-adbc,
>> apache/arrow's .deb/.rpm needs to depend on
>> apache/arrow-adbc's .deb/.rpm.)
>>
>> We can add .deb/.rpm related files
>> (dev/tasks/linux-packages/ in apache/arrow) to
>> apache/arrow-adbc to build .deb/.rpm for apache/arrow-adbc.
>>
>> FYI: I did it for datafusion-contrib/datafusion-c:
>>
>> * https://github.com/datafusion-contrib/datafusion-c/tree/main/package
>> * 
>> https://github.com/datafusion-contrib/datafusion-c/blob/main/.github/workflows/package.yaml
>>
>> I can work on it in apache/arrow-adbc.
>>
>>
>> Thanks,
>> -- 
>> kou
>>
>> In <5cbf2923-4fb4-4c5e-b11d-007209fdd...@www.fastmail.com>
>>   "Re: [DISC] Improving Arrow's database support" on Thu, 25 Aug 2022 
>> 11:51:08 -0400,
>>   "David Li" <lidav...@apache.org> wrote:
>>
>>> Fair enough, thank you. I'll try to expand a bit. (Sorry for the wall of 
>>> text that follows…)
>>> 
>>> These are the components:
>>> 
>>> - Core adbc.h header
>>> - Driver manager for C/C++
>>> - Flight SQL-based driver
>>> - Postgres-based driver (WIP)
>>> - SQLite-based driver (more of a testbed for me than an actual component - 
>>> I don't think we'd actually distribute this)
>>> - Java core interfaces
>>> - Java driver manager
>>> - Java JDBC-based driver
>>> - Java Flight SQL-based driver
>>> - Python driver manager
>>> 
>>> I think: adbc.h gets mirrored into the Arrow repo. The Flight SQL drivers 
>>> get moved to the main Arrow repo and distributed as part of the regular 
>>> Arrow releases.
>>> 
>>> For the rest of the components: they could be packaged individually, but 
>>> versioned and released together. Also, each C/C++ driver probably needs a 
>>> corresponding Python package so Python users do not have to futz with 
>>> shared library configurations. (See [1].) So for instance, installing 
>>> PyArrow would also give you the Flight SQL driver, and `pip install 
>>> adbc_postgres` would get you the Postgres-based driver.
>>> 
>>> That would mean setting up separate CI, release, etc. (and eventually 
>>> linking Crossbow & Conbench as well?). That does mean duplication of 
>>> effort, but the trade off is avoiding bloating the main release process 
>>> even further. However, I'd like to hear from those closer to the release 
>>> process on this subject - if it would make people's lives easier, we could 
>>> merge everything into one repo/process.
>>> 
>>> Integrations would be distributed as part of their respective packages 
>>> (e.g. Arrow Dataset would optionally link to the driver manager). So the 
>>> "part of Arrow 10.0.0" aspect means having a stable interface for adbc.h, 
>>> and getting the Flight SQL drivers into the main repo.
>>> 
>>> [1]: https://github.com/apache/arrow-adbc/issues/53
>>> 
>>> On Thu, Aug 25, 2022, at 11:34, Antoine Pitrou wrote:
>>>> On Fri, 19 Aug 2022 14:09:44 -0400
>>>> "David Li" <lidav...@apache.org> wrote:
>>>>> Since it's been a while, I'd like to give an update. There are also a few 
>>>>> questions I have around distribution.
>>>>> 
>>>>> Currently:
>>>>> - Supported in C, Java, and Python.
>>>>> - For C/Python, there are basic drivers wrapping Flight SQL and SQLite, 
>>>>> with a draft of a libpq (Postgres) driver (using nanoarrow).
>>>>> - For Java, there are drivers wrapping JDBC and Flight SQL.
>>>>> - For Python, there's low-level bindings to the C API, and the DBAPI 
>>>>> interface on top of that (+a few extension methods resembling 
>>>>> DuckDB/Turbodbc).
>>>>>  
>>>>> There's drafts of integration with Ibis [1], DBI (R), and DuckDB. (I'd 
>>>>> like to thank Hannes and Kirill for their comments, as well as Antoine, 
>>>>> Dewey, and Matt here.)
>>>>> 
>>>>> I'd like to have this as part of 10.0.0 in some fashion. However, I'm not 
>>>>> sure how we would like to handle packaging and distribution. In 
>>>>> particular, there are several sub-components for each language (the 
>>>>> driver manager + the drivers), increasing the work. Any thoughts here?
>>>>
>>>> Sorry, forgot to answer here. But I think your question is too broadly
>>>> formulated. It probably deserves a case-by-case discussion, IMHO.
>>>>
>>>>> I'm also wondering how we want to handle this in terms of specification - 
>>>>> I assume we'd consider the core header file/Java interfaces a spec like 
>>>>> the C Data Interface/Flight RPC, and vote on them/mirror them into the 
>>>>> format/ directory?
>>>>
>>>> That sounds like the right way to me indeed.
>>>>
>>>> Regards
>>>>
>>>> Antoine.

Reply via email to