This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow-steward.git


The following commit(s) were added to refs/heads/main by this push:
     new 9235817  docs: add MISSION.md — Apache <PROJECT_NAME> TLP proposal 
draft (#57)
9235817 is described below

commit 92358172050c23db1b6cc7bd9a55c708dcb5f24e
Author: Jarek Potiuk <[email protected]>
AuthorDate: Tue May 5 15:24:20 2026 +0200

    docs: add MISSION.md — Apache <PROJECT_NAME> TLP proposal draft (#57)
    
    * docs: add MISSION.md — Apache <PROJECT_NAME> TLP proposal draft
    
    Working draft of the project-establishment proposal for the
    Apache <PROJECT_NAME> TLP — agent-assisted repository-maintainership
    infrastructure built on the work that became apache-steward, with
    non-ASF-as-first-class adoption, mentoring as a first-class mode,
    security-issue handling as a load-bearing use case, and privacy +
    supply-chain integrity + vendor neutrality as foundational
    commitments.
    
    Sections: Abstract, Proposal, Proposed Names, Rationale, Initial
    Goals, Technical scope, Maintainer education, Privacy / security /
    supply-chain integrity, Affordability and vendor neutrality, Initial
    PMC composition target, Required resources, Source and IP, External
    dependencies, Cryptography, Particular care, Ask of the Board.
    
    References section is intentionally omitted until the linked RFCs
    (RFC-AI-0002, RFC-AI-0003) and the ASF Responsible AI Initiative
    description are public.
    
    Generated-by: Claude Code (Claude Opus 4.7)
    
    * Apply suggestion from @potiuk
    
    * docs(MISSION): add !Draft callout at the top
    
    GitHub-flavored alert (`> [!IMPORTANT]`) before the motto, naming the
    document as a working draft so readers don't mistake it for a final
    committee-ready proposal.
    
    Generated-by: Claude Code (Claude Opus 4.7)
    
    * docs(MISSION): regenerate doctoc TOC + cross-reference from README
    
    Two fixes in one commit:
    
    - Regenerate the doctoc TOC at the top of MISSION.md — CI's prek run
      flagged it on PR #57. doctoc auto-generated; this commit lands the
      generated TOC.
    - Add a prominent !IMPORTANT callout near the top of README.md
      pointing at MISSION.md as the *why* (the draft project-establishment
      proposal), with the README being the *how* once an adopter has
      decided. Also added a Cross-references entry.
    
    Generated-by: Claude Code (Claude Opus 4.7)
    
    * fix(MISSION): use backticked <PROJECT_NAME> placeholder
    
    CI flagged two failures stemming from the same root cause:
    
    - markdownlint MD051 — the doctoc-generated TOC entry pointed at
      `#apache-ltproject_namegt`, which is not a fragment GitHub
      actually renders. The H1 used HTML entities (`&lt;PROJECT_NAME&gt;`)
      to keep the angle brackets visible, but doctoc slugified the
      literal `&lt;` / `&gt;` characters.
    - lychee — same fragment-not-found error against the same anchor.
    
    Fix: replace `&lt;PROJECT_NAME&gt;` with backticked `<PROJECT_NAME>`
    throughout the document (matches the rest of the framework's
    docs, where placeholders like `<tracker>`, `<upstream>`,
    `<security-list>` are consistently backticked). Doctoc
    regenerates a clean `#apache-project_name` anchor.
    
    Generated-by: Claude Code (Claude Opus 4.7)
---
 MISSION.md | 204 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 README.md  |   8 +++
 2 files changed, 212 insertions(+)

diff --git a/MISSION.md b/MISSION.md
new file mode 100644
index 0000000..fd54c40
--- /dev/null
+++ b/MISSION.md
@@ -0,0 +1,204 @@
+<!-- START doctoc generated TOC please keep comment here to allow auto update 
-->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+**Table of Contents**  *generated with 
[DocToc](https://github.com/thlorenz/doctoc)*
+
+- [Apache `<PROJECT_NAME>`](#apache-project_name)
+  - [Abstract](#abstract)
+  - [Proposal](#proposal)
+  - [Proposed Names](#proposed-names)
+  - [Rationale](#rationale)
+  - [Initial Goals](#initial-goals)
+  - [Technical scope](#technical-scope)
+  - [Maintainer education — building agentic projects is a different 
craft](#maintainer-education--building-agentic-projects-is-a-different-craft)
+  - [Privacy, security, and supply-chain integrity — the top-most 
priority](#privacy-security-and-supply-chain-integrity--the-top-most-priority)
+  - [Affordability and vendor neutrality — the public-good 
commitment](#affordability-and-vendor-neutrality--the-public-good-commitment)
+  - [Initial PMC composition (target)](#initial-pmc-composition-target)
+  - [Required resources](#required-resources)
+  - [Source and IP](#source-and-ip)
+  - [External dependencies](#external-dependencies)
+  - [Cryptography](#cryptography)
+  - [Particular care](#particular-care)
+  - [Ask of the Board](#ask-of-the-board)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+# Apache `<PROJECT_NAME>`
+
+> [!IMPORTANT]
+> **Draft** — this proposal is a working draft, actively being edited. Expect 
substantial changes; not yet ready for committee or Board review.
+
+> **Motto:** *"Give maintainers time back, so they can do what matters."*
+
+## Abstract
+
+Apache `<PROJECT_NAME>` is platform infrastructure for **agent-assisted 
repository maintainership** — across the ASF and equally for any open-source 
project that wants in. Three streams of day-to-day work:
+
+- **Security-issue handling** end-to-end — inbound triage, deduplication, 
agent-drafted reporter replies under human review, CVE allocation hand-off, 
audit-logged status tracking through publication.
+- **Issue and PR triage and management** — including audit-tool findings 
(Apache Verum, Apache Caer, equivalents) ingested as actionable issues.
+- **Conversational contributor mentoring** — meeting new contributors where 
they are.
+
+One conviction underneath: each project picks how much automation actually 
fits. The platform makes a range of automation levels possible without picking 
one for you, and "project" means both an ASF PMC and any non-ASF community — 
neither is a second-class citizen.
+
+## Proposal
+
+The Apache Software Foundation establishes the Apache `<PROJECT_NAME>` Project 
as a Top-Level Project by Board resolution, scope: agent-assisted 
repository-maintainership infrastructure under the Apache License, Version 2.0.
+
+## Proposed Names
+
+The initial committee will discuss and vote on the final name. Starting 
candidates (the list isn't closed — anyone on the initial committee can pitch a 
different name):
+
+- Apache Mentor
+- Apache Guild
+- Apache Minerva
+- Apache Magpie
+- Apache Beacon
+- Apache Compass
+- Apache Lexicon
+- Apache Polyglot
+
+## Rationale
+
+Open-source projects share the same shape of problem: contributors keep 
arriving, reviewers don't scale to match, and the highest-stakes work — 
security-issue handling — is the *most* manual, the *most* reviewer-intensive, 
and the *most* embarrassing to get wrong. The two complaints heard most loudly 
— **onboarding latency** and **review-cycle latency** — are the priorities the 
ASF Responsible AI Initiative names. `<PROJECT_NAME>` is the operational layer 
for those goals: not a position  [...]
+
+Three design choices set the project apart from "just bolt a code-review bot 
on it":
+
+**Project autonomy is the structural starting point — and "project" includes 
non-ASF.** Four modes (A – triage, B – mentoring, C – agent-authored fix with 
human review, D – narrowly-scoped auto-merge) ship as separate, 
independently-toggleable skills. Each project picks the modes that match its 
culture and risk tolerance. ASF integrations (private lists, Vulnogram CVE 
flows, PMC roles, ASF release process) live behind clean configuration 
boundaries; non-ASF adopters swap them for whateve [...]
+
+**Security-issue handling is a load-bearing use case, not a footnote on 
triage.** The work that became `<PROJECT_NAME>` started as a framework for 
handling ASF security reports — high-stakes, high-procedure, 
every-step-needs-an-audit-trail flows that turn out to be exactly what 
agent-assisted-with-human-gates is good at. Every Mode A/B/C/D capability has 
to clear the security-flow bar (private content stays private, every outbound 
draft has a human signature, every state change is logged [...]
+
+**Mentoring is a first-class mode, not a side-effect of triage.** The lever 
the ASF — and the wider open-source world — actually needs and the one 
off-the-shelf agent tooling skips. Meets new contributors where they are, 
explains conventions, points at the relevant prior PR, asks the clarifying 
question *before* a reviewer burns time on it. This is where the Responsible AI 
Initiative's contributor-empowerment goal gets operationalised: the mode that 
produces the outcomes RAI is trying to [...]
+
+## Initial Goals
+
+- Stand up `github.com/apache/<PROJECT_NAME>` with project skeleton, CI, and 
contributor docs.
+- Provision standard ASF infrastructure: `private@`, `dev@`, `commits@`; 
GitHub Issues; site at `<PROJECT_NAME>.apache.org`.
+- Get modes A–C running against **3–4 friendly pilots within 3 months** — at 
least one ASF PMC running the full security-issue flow (Airflow, given the 
project's lineage), one ASF PMC running just triage + mentoring (Arrow or ATR), 
and **at least one non-ASF project from day one** (Python core has folks 
interested). Non-ASF in the first cohort, not later — the 
project-governance-agnosticism claim is only worth what it can prove.
+- Cut a first Apache release through the standard process within 3 months of 
resolution adoption, with artefacts usable directly by non-ASF adopters (no 
ASF-only assumption baked into the install path).
+- Wire modes A–C up to Apache Verum and Apache Caer findings, and to at least 
one non-ASF audit-tool equivalent (a CodeQL output stream is the likely first 
non-ASF case).
+- Settle on a contributor-sentiment evaluation methodology with Apache Plumb 
(separate proposal). Eval covers both ASF and non-ASF cohorts so the data isn't 
an internal-ASF artefact.
+- **Ship the privacy and security posture** as a release-blocking part of v1 — 
sandbox setup, clean-env wrapper, privacy-LLM gate, PII redactor, signed 
releases, pinned-tools manifest. Not a follow-up.
+- **Ship the maintainer-education stream** alongside v1 — pattern catalogue, 
"your first skill" path, first scheduled workshops. The platform is only as 
adoptable as the docs that go with it.
+- **Validate vendor-neutrality** in v1 pilots: at least one project running 
modes A–C against a frontier-model backend, one against fully-local inference 
(Ollama / vLLM), one against an Apache-hosted or Apache-aligned endpoint as it 
becomes available.
+
+## Technical scope
+
+A platform substrate — issue and PR ingestion, GitHub API write-back, 
conversation threading, audit logging, integration with adjacent systems 
(Gmail, PonyMail, Vulnogram, generic CVE submission, an extensible adapter 
layer so non-ASF adopters plug in their own equivalents) — with four modes 
built on top:
+
+**Mode A — triage assistant** for issues, security reports, and PRs. *On the 
security side:* spots inbound reports, classifies against prior triaged cases, 
surfaces likely duplicates, identifies anything that should not have been filed 
publicly, proposes initial routing to the security team. *On the regular side:* 
suggests labels, spots duplicates, links related discussions, proposes routing. 
Every output is a suggestion the human signs off on; nothing lands without 
review. Lowest risk surface.
+
+**Mode B — conversational mentoring**. Joins issue and PR threads in a 
deliberately teaching register: clarifying questions, pointers to project 
conventions and docs, an explanation of *why* a change is being asked for, 
paired examples from similar prior PRs, clean hand-off to a human reviewer when 
the question exceeds what an agent should answer. The differentiator and the 
highest-value mode — where the Responsible AI Initiative's empowerment outcome 
lives.
+
+**Mode C — agent-authored fixes with human review**. The agent drafts a fix 
for a well-scoped problem (a tracked issue, a triaged security report with team 
consensus on scope, an Apache Verum or Apache Caer finding, a failing test with 
an obvious cause, a documentation hole) and opens a PR. Every PR is reviewed 
and merged by a human committer; the agent never merges its own work. For 
security PRs the public surface strips CVE / private context per the project's 
disclosure policy, so the  [...]
+
+**Mode D — narrowly-scoped fix-and-merge**. Auto-merge restricted to 
objectively boring change classes — lint fixes, dependency bumps inside an 
allow-list, license-header insertion, formatting, broken-link repair. 
Per-project AND per-class opt-in; every auto-merged change is reversibly 
logged. **Not turned on** until A/B/C have been running for two quarters and 
contributor-sentiment data says the project is healthier, not just faster. 
Security-class changes are explicitly *out* of D — no [...]
+
+The substrate also handles per-project config (which modes are on, eligible 
change classes, who reviews, how disputes route, where security reports come 
from, where audit findings come from, what the release process expects), full 
audit logging and rollback for every agent-authored change — security and 
non-security alike — and an integration hook for the Apache Plumb eval 
framework so the contributor-empowerment claim has measurable data behind it.
+
+## Maintainer education — building agentic projects is a different craft
+
+Most maintainers have never built an agentic application before. The mental 
model is genuinely different from what twenty years of writing services and 
CLIs trained us for: behaviour is **probabilistic, not deterministic**; prompts 
and skill files **are code** in every meaningful sense; **evaluating output is 
harder than testing a function**; the unit of authorship shifts from "a 
function in a file" to "a skill the agent invokes". The instincts that keep 
regular code reliable — strict ty [...]
+
+`<PROJECT_NAME>` runs a maintainer-facing education stream as a **first-class 
part of the project**, not an afterthought wiki page:
+
+- **Pattern catalogue** — copy-pasteable skill / prompt / tool-use patterns 
with notes on what worked, what didn't, and why. The same way the early days of 
Python testing or distributed systems were taught: war stories with code 
attached.
+- **Eval-driven development examples** — how to think about correctness when 
"correct" is a distribution. Worked examples from real `<PROJECT_NAME>` modes; 
integration with Apache Plumb so the eval methodology is shared, not reinvented 
per-project.
+- **Workshops and pairing sessions** — scheduled office-hour sessions where 
maintainers from any project (ASF or not) can show up with their use case and 
pair with the `<PROJECT_NAME>` team. Recordings published.
+- **A "your first skill" path** — equivalent of "your first PR" docs, but for 
landing a working skill in your project. Aim: any motivated maintainer can take 
a working agentic skill from zero to merged in a weekend, without first having 
to learn LLM internals.
+
+Every `<PROJECT_NAME>` release ships with the docs and patterns the 
maintainers using it actually need. The steepness of this learning curve is 
currently one of the larger barriers to broader agentic adoption in open 
source; lowering it is part of the platform's job.
+
+## Privacy, security, and supply-chain integrity — the top-most priority
+
+Most maintainers asked about agentic tooling lead with the same fears, in 
roughly this order:
+
+- *Will my credentials end up in some model provider's training data?*
+- *Will pre-disclosure CVE content leak out of the agent's context?*
+- *What does the agent's dependency tree look like, and who controls it?*
+- *Can a malicious issue or PR comment talk the agent into running something I 
didn't authorise?*
+- *Can the agent quietly exfiltrate code or contributor data?*
+- *If something goes wrong, can I see what happened and undo it?*
+
+Not theoretical — the actual reason a lot of capable maintainers are *not* 
using agentic tools today, even when those tools would help. `<PROJECT_NAME>`'s 
response, baked into the project's foundation rather than retrofitted later:
+
+- **Clean-environment wrapper** around every agent invocation — no envvars 
from the surrounding shell unless explicitly allow-listed; no silent leakage of 
API keys, tokens, paths.
+- **Layered sandbox by default** — filesystem, network, and tool-permission 
rules enforced at the harness layer; sandbox bypasses surface a loud, visible 
warning before they run, never silently.
+- **Privacy-aware LLM routing** — private content (security reports, embargoed 
CVE detail, PMC-private mail) flows only to LLMs the project's PMC has 
explicitly approved, with a recorded data-residency contract. The framework 
refuses to route private bytes through a non-approved model. *Already 
implemented in the upstreamed framework that became `<PROJECT_NAME>`.*
+- **PII redaction at the boundary** — reporter identity flows where 
operationally needed (CVE credit, reply threads); third-party PII gets redacted 
to stable identifiers before any LLM context.
+- **Pinned, reviewed, signed dependencies** — every system tool (`bubblewrap`, 
`socat`, agent CLI) pinned to a version aged through a documented cooldown 
window. Bumps are PRs, not silent updates. Supply-chain risk treated like code 
change.
+- **Audit log every agent-authored action** — comments, labels, drafts, 
issues, PRs. Reversible where possible; flagged where not.
+- **Hard rule: external content is data, never instructions** — reporter mail, 
PR comments, GHSA forwards, attachments. Documented at the framework level, 
enforced at the skill level.
+
+The choice to land `<PROJECT_NAME>` at the ASF — rather than as an independent 
project or vendor offering — is load-bearing for this. **The ASF is a trust 
layer.** Maintainers who would (reasonably) hesitate to install a vendor's 
agent framework on their dev machine, or grant it access to their security 
mailing list, will more readily install one that comes through the same release 
process as the rest of their toolchain, signed by the same KEYS, governed by a 
PMC, held to the same softwa [...]
+
+This is the **first** priority — not the first among many. If a feature has to 
slow to keep this story honest, it slows.
+
+## Affordability and vendor neutrality — the public-good commitment
+
+Current state of agentic tooling for open source: maintainers doing the most 
agent-assisted work tend to have **expensive personal subscriptions** to one or 
more frontier-model providers, or **complimentary access** a vendor handed 
them. Both work, neither is sustainable, neither is fair. A maintainer in a 
country where a $200/month subscription is six weeks of pay does not get to 
participate. A project whose lead maintainer happens to have a vendor 
relationship gets capabilities its pee [...]
+
+The gap `<PROJECT_NAME>` exists to close, with an uncompromising long-term 
commitment:
+
+- **Vendor neutrality is non-negotiable, top to bottom.** Every mode runs 
against the project's chosen LLM, not a hard-coded one. The platform's contract 
with the model is well-defined enough that Claude, OpenAI, 
Anthropic-via-Bedrock, Google, locally-hosted Llama / Qwen / DeepSeek (Ollama, 
vLLM), and a future ASF-hosted endpoint are all valid backends with the same 
skill code on top. Skills are written against the contract, not the vendor.
+- **Local and self-hosted paths are first-class, not fallback.** A maintainer 
running Ollama gets the same skill catalogue as one running a frontier-model 
subscription. Local-only inference is also the simplest answer to most of the 
privacy concerns above — it never leaves the machine.
+- **An ASF-hosted inference endpoint is on the long-term roadmap** — 
`inference.apache.org` (name TBD): a community-affordable, foundation-governed, 
audit-logged inference layer any open-source maintainer (ASF or not) can use to 
participate in agentic development without paying a vendor or accepting a 
vendor's gift. The long-term shape of "release software for the public good" in 
the agentic era.
+- **Economics get documented honestly.** `<PROJECT_NAME>`'s docs include a 
"what does each mode actually cost to run" page — token counts per typical 
invocation, per mode, per model class — so a maintainer evaluating adoption can 
make an informed call instead of guessing. The same data informs the case for 
the ASF-hosted endpoint when the community is ready to ask the question.
+
+The ASF mission line — *"to provide software for the public good"* — has 
always meant the *running* software, not just the source code. For agentic 
tooling, the running software increasingly *is* the model, and the public-good 
commitment has to extend that far. **If `<PROJECT_NAME>` ends up being a thing 
only well-resourced maintainers can run, it has failed its core mission, 
regardless of how good the code is.**
+
+## Initial PMC composition (target)
+
+PMC composition matters more than most because the project's social stakes are 
higher than its technical stakes. The PMC will be filled from existing ASF 
members, and potentially Apache Airflow PMC members where implementation of 
A/B/C is already live and used — coordinated with Membership before the 
resolution is adopted.
+
+- **Size:** 7–9 members.
+- **Diversity:** at least three distinct organisational affiliations; no 
single employer holding a majority.
+- **Coverage:** at least two committers from each friendly-pilot PMC (Airflow, 
Arrow, ATR, or similar) for the user-side reality check; at least one committer 
with explicit responsibility for contributor experience, mentoring, and 
onboarding rather than just engineering; ASF Privacy and ASF Legal engaged from 
project start, given the contributor-data surface.
+- **Chair:** Jarek Potiuk, subject to PMC vote per Bylaws.
+
+ASF members for the roster:
+
+- Jarek Potiuk — Airflow PMC
+- Piotr Karwasz — Log4J PMC
+- Elad Kalif — Airflow PMC
+- Matthew Topol — Arrow PMC, Iceberg PMC
+- Pavan Kumar — Airflow PMC
+- Amogh Desai — Airflow PMC
+- Andrew Musselman — Mahout PMC
+- Justin Mclean — Incubator PMC, Training PMC
+- Jean-Baptiste Onofré — Incubator PMC, Polaris PMC, …
+
+The named PMC roster will accompany the resolution at the time of vote.
+
+## Required resources
+
+- **Mailing lists:** `private@<PROJECT_NAME>.apache.org`, 
`dev@<PROJECT_NAME>.apache.org`, `commits@<PROJECT_NAME>.apache.org`.
+- **Source control:** `github.com/apache/<PROJECT_NAME>`.
+- **Issue tracking:** GitHub Issues.
+- **Website:** `<PROJECT_NAME>.apache.org`.
+- **Release infrastructure:** `dist.apache.org` per standard ASF process.
+
+## Source and IP
+
+Green-field project. Existing project-agnostic code in the Apache Airflow PMC 
— already designed to be reusable in and outside the ASF — could be donated to 
speed implementation; some related ideas are implemented in Gofannon.
+
+## External dependencies
+
+A current SKILL-based implementation already covers PR triaging, 
security-issue management, and the maintainer review process — 
language-independent, since SKILLs are English. Standard Python ecosystem 
dependencies for the deterministic-output scripts. No AI SDK integration 
needed; the solution is pure agentic SKILL implementation understood by most AI 
CLIs. Apache-license compatibility verified.
+
+## Cryptography
+
+Standard TLS for HTTPS API calls. No novel cryptography. ECCN classification 
reviewed as not applicable.
+
+## Particular care
+
+The contributor experience is the most sensitive surface in any open-source 
project. Getting the tone wrong, mishandling a junior contributor, or letting 
an agent gatekeep where a human should is more damaging than any technical bug 
the project might ship — and the failure mode is not reversible by patch: a 
contributor who feels condescended-to by an agent and leaves does not get 
re-recruited.
+
+The project commits to:
+
+- **Mentoring-first sequencing** — Modes A and B before C and D.
+- **ASF Privacy and Legal involvement from project start**, not 
retrospectively.
+- **Contributor-sentiment evidence as a graduation criterion** for new 
automation modes alongside the standard technical-maturity criteria.
+- **Tight feedback loop** — the SKILL-based approach with human oversight lets 
agentic skills self-update from maintainer / triager feedback and contributor 
responses; mistakes get corrected and message tone is tuned to the 
communication style the PMC selects.
+
+## Ask of the Board
+
+Adopt the accompanying resolution establishing the Apache `<PROJECT_NAME>` 
Project as a Top-Level Project, with initial PMC roster as filed at the time of 
vote.
diff --git a/README.md b/README.md
index 3530bdb..273e87b 100644
--- a/README.md
+++ b/README.md
@@ -60,6 +60,13 @@ the marketplace path opens up. See
 [release-distribution](https://infra.apache.org/release-distribution.html)
 for the canonical distribution mechanism we will adopt.
 
+> [!IMPORTANT]
+> The motivation, scope, and design commitments behind this work
+> live in [`MISSION.md`](MISSION.md) — the **draft** project-
+> establishment proposal for an Apache Top-Level Project built on
+> this framework. Read that for the *why*; this README is the
+> *how* once you've decided to adopt.
+
 ## How adoption works
 
 The framework uses a **snapshot + agentic-override** adoption
@@ -203,6 +210,7 @@ maintenance:
 
 ## Cross-references
 
+- [`MISSION.md`](MISSION.md) — **draft** project-establishment proposal: 
motivation, scope, design commitments, initial PMC composition target.
 - [`docs/setup/agentic-overrides.md`](docs/setup/agentic-overrides.md) — the 
contract between adopters who write overrides and framework skills that read 
them.
 - [`docs/prerequisites.md`](docs/prerequisites.md) — what a maintainer needs 
installed before invoking any framework skill (Claude Code, Gmail MCP, GitHub 
auth, browser, `uv`, etc.).
 - [`AGENTS.md`](AGENTS.md) — agent instructions, placeholder convention, 
framework conventions.

Reply via email to