kaxil opened a new pull request, #67786: URL: https://github.com/apache/airflow/pull/67786
Adds [Agent Skills](https://agentskills.io) support to the `common.ai` provider as `AgentSkillsToolset`, a pydantic-ai toolset (alongside `SQLToolset`, `HookToolset`, `MCPToolset`). Skills are `SKILL.md` bundles the model discovers and loads on demand (progressive disclosure), so a large skill library costs few tokens until a skill is actually used. ```python from airflow.providers.common.ai.operators.agent import AgentOperator from airflow.providers.common.ai.skills import GitSkills from airflow.providers.common.ai.toolsets.skills import AgentSkillsToolset AgentOperator( task_id="agent", prompt="...", llm_conn_id="pydanticai_default", toolsets=[ AgentSkillsToolset(sources=[ "./skills", # local SKILL.md directory GitSkills(repo_url="https://github.com/my-org/agent-skills", conn_id="github_skills", path="skills"), # private repo (git connection) ]), ], ) ``` Backed by the community [pydantic-ai-skills](https://github.com/DougTrajano/pydantic-ai-skills) package (MIT), pulled in only through the optional `skills` extra: ```bash pip install "apache-airflow-providers-common-ai[skills]" ``` ## Why pydantic-ai has no native skills primitive yet; native progressive disclosure is in flight upstream in [pydantic/pydantic-ai#5230](https://github.com/pydantic/pydantic-ai/pull/5230). This wires the community implementation in behind a small toolset so users get connection-based skill loading today, with a surface that maps onto the native primitive when it lands. ## Design notes - **Resolved at run time, not parse time.** The underlying `SkillsToolset` loads its registries eagerly at construction, so building a Git-backed toolset in the DAG body would clone the repo while the DAG processor parses the file and bake the token into the serialized DAG. `AgentSkillsToolset` instead resolves connections and clones on `__aenter__` (when the agent enters the toolset, on the worker) and removes cloned directories on `__aexit__`. A Git token is never present in the serialized DAG; only the `conn_id` is. - **Reusable beyond the operator.** `AgentSkillsToolset` is a normal pydantic-ai `AbstractToolset`, so it also works with a raw `pydantic_ai.Agent` you build yourself (anywhere the Airflow connection backend is reachable). The operator is unchanged, skills are just a toolset. - **Framework-portable core.** Because Agent Skills is a cross-framework format, the connection handling is exposed framework-agnostically through `resolve_skills(...)`, which returns local `SKILL.md` directories that any loader accepts (it needs only GitPython, no pydantic-ai): ```python from airflow.providers.common.ai.skills import GitSkills, resolve_skills with resolve_skills(["./skills", GitSkills(repo_url="https://...", conn_id="github_skills")]) as dirs: create_deep_agent(model="openai:gpt-5.4", skills=dirs) # LangChain DeepAgents Agent(plugins=[AgentSkills(skills=dirs)]) # Strands ``` `resolve_skills` needs the Git provider (for `GitSkills`) but not pydantic-ai. - **Git, local only for now.** Object storage (S3/GCS) is deferred so the recursive-download layout and lifecycle can be verified against a real bucket first. ## Security and gotchas - Skill bundles can contain scripts the agent may run on the worker through pydantic-ai-skills' `run_skill_script` tool. This keeps the upstream default and is documented: point `GitSkills` at a trusted repository and pin `branch` to a trusted ref. - `GitSkills` credentials come from an Airflow `git` connection resolved through the Git provider's `GitHook` (HTTPS token in the password, or an SSH key in the extra). Nothing is read from the worker environment: `conn_id` omitted means an anonymous clone, and plain `http://` with `conn_id` is rejected so a credential is never sent in cleartext. After cloning, the token is stripped from the checkout's `.git/config`. - The `skills` extra pulls `apache-airflow-providers-git` (GitHook + GitPython) and `pydantic-ai-skills` (which requires `pydantic-ai-slim>=1.74`; the provider base floor stays at 1.71). ## Follow-ups - Object-storage skill sources (S3/GCS via `ObjectStoragePath`). - Migrate to native pydantic-ai on-demand capabilities once [#5230](https://github.com/pydantic/pydantic-ai/pull/5230) ships. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
