This is really cool, thanks for sharing Kaxil and Pavan.

Thanks & Regards,
Amogh Desai


On Thu, Mar 5, 2026 at 6:34 PM Kaxil Naik <[email protected]> wrote:

> Hi everyone,
>
> Pavan and I have been working on AIP-99 native agentic AI for Airflow 3.
> The first set of PRs have landed.
>
> The core idea: Airflow already has 350+ provider hooks, each
> pre-authenticated through connections. AIP-99 turns those hooks directly
> into AI agent tools.
>
> What's available now:
>
> 1. HookToolset: wraps any Airflow hook into AI-callable tools with
>    explicit allowed_methods:
>
>    from airflow.providers.common.ai.toolsets import HookToolset
>
>    HookToolset(hook=S3Hook(aws_conn_id="my_aws"),
> allowed_methods=["list_keys"])
>
> 2. SQLToolset: 4 curated database tools (list tables, describe schema,
>    execute query, fetch results) scoped to specific tables.
>
> 3. DataFusionToolset — lets AI agents query files on object stores (S3,
>    local filesystem, Iceberg) through Apache DataFusion. Agents get SQL
>    access to Parquet, CSV, and Avro files without loading them into a
>    database.
>
> 4. MCPToolset: connects to external MCP servers via Airflow connections.
>
> 5. Task decorators (Operators are also available :) ):
>    - @task.llm : single LLM call with structured output
>    - @task.agent : multi-step agent with tool access
>    - @task.llm_sql : text-to-SQL pipelines
>    - @task.llm_schema_compare : cross-database schema diffing
>
> LLM connections are configured through
> Airflow's standard connection model, supporting OpenAI, Anthropic, Google,
> Ollama, etc.
>
> HITL (Human-in-the-Loop) integration is also in progress as a draft PR.
>
> Project Board:
> - https://github.com/orgs/apache/projects/586
>
> Summit talk where we previewed this:
> https://www.youtube.com/watch?v=XSAzSDVUi2o
>
> Separate from the AI work, AIP-99 also adds an AnalyticsOperator powered
> by Apache DataFusion for high-performance SQL on object stores:
>
> - AnalyticsOperator — run SQL queries directly against S3, GCS, local
>   files, and Iceberg tables. Supports Parquet, CSV, Avro.
> - @task.analytics decorator — TaskFlow API support for the above.
> - Iceberg support via PyIceberg with Glue catalog integration.
>
> Pavan and I would love it if folks can start testing out and create GitHub
> issues if you run into bugs. Our intention is to keep it at 0.x version so
> we can iterate on it faster. Looking forward to feedback.
>
> Thanks,
> Kaxil
>

Reply via email to