kaxil opened a new pull request, #67786:
URL: https://github.com/apache/airflow/pull/67786

   Adds [Agent Skills](https://agentskills.io) support to the `common.ai` 
provider as `AgentSkillsToolset`, a pydantic-ai toolset (alongside 
`SQLToolset`, `HookToolset`, `MCPToolset`). Skills are `SKILL.md` bundles the 
model discovers and loads on demand (progressive disclosure), so a large skill 
library costs few tokens until a skill is actually used.
   
   ```python
   from airflow.providers.common.ai.operators.agent import AgentOperator
   from airflow.providers.common.ai.skills import GitSkills
   from airflow.providers.common.ai.toolsets.skills import AgentSkillsToolset
   
   AgentOperator(
       task_id="agent",
       prompt="...",
       llm_conn_id="pydanticai_default",
       toolsets=[
           AgentSkillsToolset(sources=[
               "./skills",                                   # local SKILL.md 
directory
               GitSkills(repo_url="https://github.com/my-org/agent-skills";,
                         conn_id="github_skills", path="skills"),  # private 
repo (git connection)
           ]),
       ],
   )
   ```
   
   Backed by the community 
[pydantic-ai-skills](https://github.com/DougTrajano/pydantic-ai-skills) package 
(MIT), pulled in only through the optional `skills` extra:
   
   ```bash
   pip install "apache-airflow-providers-common-ai[skills]"
   ```
   
   ## Why
   
   pydantic-ai has no native skills primitive yet; native progressive 
disclosure is in flight upstream in 
[pydantic/pydantic-ai#5230](https://github.com/pydantic/pydantic-ai/pull/5230). 
This wires the community implementation in behind a small toolset so users get 
connection-based skill loading today, with a surface that maps onto the native 
primitive when it lands.
   
   ## Design notes
   
   - **Resolved at run time, not parse time.** The underlying `SkillsToolset` 
loads its registries eagerly at construction, so building a Git-backed toolset 
in the DAG body would clone the repo while the DAG processor parses the file 
and bake the token into the serialized DAG. `AgentSkillsToolset` instead 
resolves connections and clones on `__aenter__` (when the agent enters the 
toolset, on the worker) and removes cloned directories on `__aexit__`. A Git 
token is never present in the serialized DAG; only the `conn_id` is.
   - **Reusable beyond the operator.** `AgentSkillsToolset` is a normal 
pydantic-ai `AbstractToolset`, so it also works with a raw `pydantic_ai.Agent` 
you build yourself (anywhere the Airflow connection backend is reachable). The 
operator is unchanged, skills are just a toolset.
   - **Framework-portable core.** Because Agent Skills is a cross-framework 
format, the connection handling is exposed framework-agnostically through 
`resolve_skills(...)`, which returns local `SKILL.md` directories that any 
loader accepts (it needs only GitPython, no pydantic-ai):
   
     ```python
     from airflow.providers.common.ai.skills import GitSkills, resolve_skills
   
     with resolve_skills(["./skills", GitSkills(repo_url="https://...";, 
conn_id="github_skills")]) as dirs:
         create_deep_agent(model="openai:gpt-5.4", skills=dirs)   # LangChain 
DeepAgents
         Agent(plugins=[AgentSkills(skills=dirs)])                # Strands
     ```
     `resolve_skills` needs the Git provider (for `GitSkills`) but not 
pydantic-ai.
   - **Git, local only for now.** Object storage (S3/GCS) is deferred so the 
recursive-download layout and lifecycle can be verified against a real bucket 
first.
   
   ## Security and gotchas
   
   - Skill bundles can contain scripts the agent may run on the worker through 
pydantic-ai-skills' `run_skill_script` tool. This keeps the upstream default 
and is documented: point `GitSkills` at a trusted repository and pin `branch` 
to a trusted ref.
   - `GitSkills` credentials come from an Airflow `git` connection resolved 
through the Git provider's `GitHook` (HTTPS token in the password, or an SSH 
key in the extra). Nothing is read from the worker environment: `conn_id` 
omitted means an anonymous clone, and plain `http://` with `conn_id` is 
rejected so a credential is never sent in cleartext. After cloning, the token 
is stripped from the checkout's `.git/config`.
   - The `skills` extra pulls `apache-airflow-providers-git` (GitHook + 
GitPython) and `pydantic-ai-skills` (which requires `pydantic-ai-slim>=1.74`; 
the provider base floor stays at 1.71).
   
   ## Follow-ups
   
   - Object-storage skill sources (S3/GCS via `ObjectStoragePath`).
   - Migrate to native pydantic-ai on-demand capabilities once 
[#5230](https://github.com/pydantic/pydantic-ai/pull/5230) ships.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to