gopidesupavan commented on code in PR #1483: URL: https://github.com/apache/airflow-site/pull/1483#discussion_r3018361574
########## landing-pages/site/content/en/blog/common-ai-provider/index.md: ########## @@ -0,0 +1,352 @@ +--- +title: "Introducing the Common AI Provider: LLM and AI Agent Support for Apache Airflow" +linkTitle: "Introducing the Common AI Provider" +authors: + - name: "Kaxil Naik" + github: "kaxil" + linkedin: "kaxil" + - name: "Pavan Kumar Gopidesu" + github: "gopidesupavan" + linkedin: "pavan-kumar-gopidesu" +description: "The Common AI Provider adds LLM and AI agent operators to Apache Airflow with 6 operators, 5 toolsets, and 20+ model providers in one package." +tags: [Community, Release] +date: "2026-04-03" +images: ["/blog/common-ai-provider/images/common-ai-provider.png"] +--- + +At [Airflow Summit 2025](https://airflowsummit.org/sessions/2025/airflow-as-an-ai-agents-toolkit-unlocking-1000-integrations-with-mcp/), we previewed what native AI integration in Apache Airflow could look like. Today we're shipping it. + +**[`apache-airflow-providers-common-ai`](https://pypi.org/project/apache-airflow-providers-common-ai/) 0.1.0** adds LLM and agent capabilities directly to Airflow. Not a wrapper around another framework, but a provider package that plugs into the orchestrator you already run. It's built on [Pydantic AI](https://ai.pydantic.dev/) and supports 20+ model providers (OpenAI, Anthropic, Google, Azure, Bedrock, Ollama, and more) through a single install. + +```bash +pip install 'apache-airflow-providers-common-ai' +``` + +Requires Apache Airflow 3.0+. + +> **Note:** This is a 0.x release. We're actively looking for feedback and iterating fast, so breaking changes are possible between minor versions. Try it, tell us what works and what doesn't. Your input directly shapes the API. + +## By the Numbers + +| | | +|---|---| +| **6** | Operators | +| **6** | TaskFlow decorators | +| **5** | Toolsets | +| **4** | Connection types | +| **20+** | Supported model providers via Pydantic AI | + + +## The Decorator Suite + +Every operator has a matching TaskFlow decorator. + +### `@task.llm`: Single LLM Call + +Send a prompt, get text or structured output back. + +```python +from pydantic import BaseModel +from airflow.providers.common.compat.sdk import dag, task + + +@dag +def my_pipeline(): + class Entities(BaseModel): + names: list[str] + locations: list[str] + + @task.llm( + llm_conn_id="my_openai_conn", + system_prompt="Extract named entities.", + output_type=Entities, + ) + def extract(text: str): + return f"Extract entities from: {text}" + + extract("Alice visited Paris and met Bob in London.") + + +my_pipeline() +``` + +The LLM returns a typed `Entities` object, not a string you have to parse. Downstream tasks get structured data through `XCom`. + +### `@task.agent`: Multi-Step Agent with Tools + +When the LLM needs to query databases, call APIs, or read files across multiple steps, use `@task.agent`. The agent picks which tools to call and loops until it has an answer. + +```python +from airflow.providers.common.ai.toolsets.sql import SQLToolset +from airflow.providers.common.compat.sdk import dag, task + + +@dag +def sql_analyst(): + @task.agent( + llm_conn_id="my_openai_conn", + system_prompt="You are a SQL analyst. Use tools to answer questions with data.", + toolsets=[ + SQLToolset( + db_conn_id="postgres_default", + allowed_tables=["customers", "orders"], + max_rows=20, + ) + ], + ) + def analyze(question: str): + return f"Answer this question about our data: {question}" + + analyze("What are the top 5 customers by order count?") + + +sql_analyst() +``` + +Under the hood, the agent calls `list_tables`, `get_schema`, and `query` on its own until it has the answer. + +### `@task.llm_branch`: LLM-Powered Branching + +The LLM decides which downstream task(s) to run. No string parsing. The LLM returns a constrained enum built from the task's downstream IDs. + +```python [email protected]_branch( + llm_conn_id="my_openai_conn", + system_prompt="Classify the support ticket priority.", +) +def route_ticket(ticket_text: str): + return f"Classify this ticket: {ticket_text}" +``` + +### `@task.llm_sql`: Text-to-SQL with Safety Rails + +Generates SQL from natural language. The operator introspects your database schema and validates the output via AST parsing ([sqlglot](https://github.com/tobymao/sqlglot)) before execution. + +```python +from airflow.providers.common.compat.sdk import dag, task + + +@dag +def sql_generator(): + @task.llm_sql( + llm_conn_id="my_openai_conn", + db_conn_id="postgres_default", + table_names=["orders", "customers"], + dialect="postgres", + ) + def build_query(ds=None): + return f"Find customers who placed no orders after {ds}" + + build_query() + + +sql_generator() +``` + +### `@task.llm_file_analysis`: Analyze Files with LLMs + +Point it at files in object storage (S3, GCS, local) and let the LLM analyze them. Supports CSV, Parquet, Avro, JSON, Markdown, and images (multimodal). Review Comment: ```suggestion Point it at files in object storage (S3, GCS, local) and let the LLM analyze them. Supports CSV, Parquet, Avro, JSON, and images (multimodal). ``` markdown not yet supported may be one to add.. will note down.. :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
