Hi David,

Thanks for raising this proposal; it would be a great addition. I was
actually planning to discuss this with you, as I believe it will support
AIP-99 by allowing this hook to provide rich context and sample data for
LLMs.

I am +1 for this :)

Regards
Pavan

On Tue, Mar 3, 2026 at 3:53 PM Blain David <[email protected]> wrote:

> Hello everyone,
>
> Following some initial discussions with Jarek Potiuk and a previously
> opened PR, I would like to formally propose the introduction of an Apache
> Arrow / ADBC provider for Airflow.
>
> Context & Motivation:
>
> While Airflow has a rich set of database-specific providers, the data
> ecosystem is rapidly shifting toward ADBC (Arrow Database Connectivity).
> ADBC solves many of the "bottleneck" issues associated with traditional
> DB-API 2.0, ODBC or JDBC drivers by leveraging columnar data access and
> Arrow-native memory representation.
>
> We are seeing significant momentum here:
>
>
>   *   Performance: Significant reduction in serialization overhead for
> bulk operations. While results vary by driver maturity and server-side
> native Arrow support (e.g., flight endpoints), ADBC provides a much higher
> performance ceiling than standard PEP 249 drivers.
>   *   Standardization: Systems like Snowflake, Apache DataFusion and
> DuckDB are increasingly treating Arrow as a first-class citizen.
>   *   Future-proofing: Tools like dbt-fusion and various lakehouse
> architectures are moving toward Arrow-based execution.
>
> The Proposal:
>
> I propose adding an apache-airflow-providers-apache-arrow (or similar)
> that introduces an AdbcHook.
>
> Key Technical Highlights:
>
>
>   *   Compatibility: By implementing DbApiHook, the AdbcHook will be
> immediately compatible with existing SQL operators.
>   *   Efficiency: It will offer a high-performance alternative to
> traditional row-based drivers without requiring users to rewrite their DAG
> logic.
>   *   Scope: Focus on providing a standardized interface for Arrow-native
> bulk reads and writes (future enhancement in AdbcHook).
>
> Community & Maintenance:
>
> I have already started the groundwork in a Draft PR (#52330).
>
> I believe this aligns with the project's goal of supporting
> high-performance data engineering patterns. I'm looking for feedback on:
>
>
>   *   Naming: Should this be a standalone adbc provider or part of an
> apache.arrow provider?  I chose the later but to be discussed.
>   *   Scope: At the moment I was only focusing purely on the
> Hook/Connection, as it extends the DbAPiHook and implements all required
> methods, it's already directly useable in SQL-operators.
>
> I'd love to gather your thoughts and gauge interest before moving to a
> formal voting thread.
>
> Draft PR: https://github.com/apache/airflow/pull/52330
>
> Best regards,
> David
>

Reply via email to