On Sun, Feb 15, 2026 at 2:50 AM Peter Wang via NumPy-Discussion < [email protected]> wrote:
> > Hey everyone, > > Sorry to be jumping in so late on this important thread. And thanks to > everyone for the thoughtful discussion. Numpy is such a visible and > important project, I'm sure that what's decided here will have massive > downstream consequences for the rest of the tech world. > > I'm happy to see that there seems to be an emerging consensus of > “principles over policing”: responsibility, understanding, transparency. > One way I think we can make this more concrete by framing it around *where* > in the project a contribution lands, and what the “blast radius” looks > like, in both space and time, if we later need to debug, rewrite, or (worst > case) roll back code for legal/regulatory reasons. > > A few thoughts / straw man of an "AI contribution policy": > > (1) Defining “Zones” of the project > > We can explicitly acknowledge that different areas of the codebase that > have different tolerance levels for AI-assisted generation. "Zone" doesn’t > have to mean a single folder. It can be: > > * a directory tree (e.g., `numpy/core`, `numpy/linalg`, `doc/`, `tools/`, > `benchmarks/`), > * part of the import namespace (public API, internal helpers), > * or a semantic area (core algorithms, ABI-sensitive paths, numerical > stability-critical routines, tutorial content, CI glue, etc.). > > For example: > > * Inner ring (high scrutiny): core algorithms, numerics, ABI/API-critical > code paths, anything performance-critical or subtle correctness-wise. > * Middle ring (moderate): tests, refactors, coverage expansion, internal > tooling, build/CI scripts. > * Outer ring (low): examples, tutorials, onboarding docs, “glue” that’s > easy to replace, small utilities, beginner-facing content. > > This is explicitly saying that the question of AI isn't a binary choice, > nor e.g. "AI is forbidden in the core". Rather, the closer you get to code > with high blast radius, the more we should demand human legibility, > reviewable provenance, and high confidence in correctness and licensing > posture. > > (2) "Blast radius" > > When someone asks “should we accept AI-generated code here?”, I think they > have an implicit model about "blast radius" already. We can render that > model explicit with a few dimensions: > > * Complexity: Is this code easy to reason about? Does it involve numerical > stability, tricky invariants, edge-case handling, low-level memory > behavior, or algorithmic subtlety? > * Impact & dependency surface: How many downstream things depend on this? > Is it part of public API? Widely imported? Affects core array semantics? If > it changes, do we risk broad downstream breakage? > * Stability & expected lifespan: Is this an area that tends to be stable > for years (core numerics), or something we expect to churn (docs examples, > CI harnesses)? The longer something is expected to persist, the higher the > cost of “oops”. > * Rollback/Replacement cost: If we had to remove it quickly, how painful > would that be? How entangled is it with other code? How hard is it to > recreate by hand? > * Legibility / testability: Can we test it robustly? Can we write property > tests? Are there known oracles? Is it feasible to get strong confidence > quickly? > > (3) Transparency: make it concrete, not vague > > I think “please be transparent” is right. At a minimum, I think we need > something like a lightweight attestation or affirmation from the > contributor — not a legal affidavit, not a license, not an attempt to > police workflow — but a structured statement that sets reviewer > expectations and creates an audit trail. > > Something along these lines (details can be bikeshedded later): > **AI Use Attestation (for PR description template / checkboxes):** > > * Did you use an AI tool to generate or substantially modify code in this > PR? (yes/no) > * If yes: > * which tool/model (and ideally version/date — these change fast), > * what parts of the PR were AI-assisted (core logic vs tests vs docs vs > refactors), > * confirm: “I understand this code, I can explain it, and I’m > responsible for it.” > > Then scale the requested detail by zone / blast radius: > > * Outer/middle ring: model/tool + high-level description is probably > enough. > * Inner ring / high blast radius: I’d like us to consider asking for more: > * the prompts (or at least the key prompts) used to generate the logic, > * any intermediate artifacts that help future maintainers understand how > we got here (e.g., the “why” behind design choices, variants considered, > constraints given to the model), > * and ideally a short human-written explanation of the algorithm and > invariants (which is good practice regardless of AI). > > I can appreciate that this seems burdensome for small PRs fixing a tiny > thing. But the above is just a straw-man and perhaps there are some nice > simplifications we can engineer to make this as lightweight a part of the > workflow as possible. (Perhaps even a stub Numpy_ai_contribution_guide.md > that gives the code-gen LLM a template to fill out and include in the PR?) > > The analogy I have in mind: treat AI like a nondeterministic *semantic > compiler*. With a normal compiler, we keep intermediate info when we care: > flags, versions, debug symbols, build logs. For high blast radius code, the > prompts and intermediate reasoning are effectively that metadata. Even if > we don’t store everything in-repo, capturing it in the PR discussion is > valuable. This is literally like preserving the seed, when we have to > check in the output of an RNG. > > (4) Why keep prompts / artifacts? (forward-looking CI idea) > > One reason I care about preserving the trail: I can imagine a future “AI > regression / reproducibility” check. > Say it’s 6–12 months from now and AI coding tools are even stronger. If we > have prompts and model versions recorded for high-blast-radius > contributions, we could run a periodic (maybe opt-in) workflow that: > > * replays historical prompts against a current/known model environment, > * compares the generated output structurally (or semantically) to what we > merged, > * and validates that the merged implementation still matches expected > numerical behavior (tests + known benchmarks). > > Even if we never automate this, having the trace helps humans debug: “what > constraints were assumed?” “what source did it mirror?” “what was the > intended invariant?” > > (5) Copyright / legality > > This is the part where I think a little conservatism is justified. The > legal landscape around training data, derived works, and obligations around > GPL/AGPL/LGPL is still evolving across jurisdictions. NumPy is permissively > licensed, but that doesn’t automatically insulate us from the provenance > question if generated code ends up looking like something from a copyleft > codebase. I can tell you for a fact that corporate legal compliance folks > will not hesitate to use the ban-hammer if, after some future court case, > it's deemed that codebases like Numpy's are "tainted" and require roll-back. > > I’m not proposing we block AI tools categorically (that’s neither > realistic nor enforceable). But I do think it’s reasonable to say: > > * in high blast radius zones, contributors should prefer tools with > clearer provenance and license posture, and we should be willing to ask for > extra diligence (explanations, tests, and/or avoiding “model wrote the > entire algorithm” submissions); > * in low blast radius zones, the risk/cost trade is different, and we can > be more permissive. > > I also think we should explicitly acknowledge that this policy may evolve > as jurisprudence and tooling clarity improves. > > As an additional note, over the last couple of years I have been actively > working on a new "AI Rights" license & tech infrastructure to help give > explicit attestation for authors of all copyrighted works, along the lines > of CC Signals[1] or IETF AI Preferences[2]. I'm actually sending this from > the AI Summit in Delhi where, as part of AI Commons, I'm convening allies > from Creative Commons, Wikimedia, Internet Archive, Common Crawl, and > others to align on shared vision and workstreams. Those who are interested > in my work on this can see the videos at links [3][4][5]. > > If you're to just watch one, I'd recommend [4] then [5], or just [5]. > > I'm happy to chat in depth with any/all of you about these topics, but I > want to be sensitive about not hijacking the Numpy list for my personal mad > ravings, so we can take them off-list if the maintainers deem it too > off-topic. If, on the other hand, y'all want to have a dialogue about this > stuff here, I can think of no finer group to pressure-test my ideas. :-) > > Cheers, > Peter > > (In the spirit of transparency and dogfooding: some parts of this email > came from a thread summarization and initial dialog with GPT 4o and 5.2, > although I only used the output as a starting point and edited heavily > afterwards.) > > > [1] https://creativecommons.org/ai-and-the-commons/cc-signals/ > [2] https://datatracker.ietf.org/wg/aipref/about/ > [3] https://www.youtube.com/watch?v=oZHl4NWaO7c > [4] "AI for All": https://www.youtube.com/watch?v=TLZ9zXnluc8 > [5] "AI Training & The Data Commons in Crisis": > https://www.youtube.com/watch?v=CdKxgT1o864 > > A prompt template the people can use with their code generation might be helpful. As an example of such: https://x.com/WEschenbach/status/2022189308065796295. Mathew Rocklin had something similar in his discussion of hooks. The idea is to find ways to avoid some problems up front. Chuck
_______________________________________________ NumPy-Discussion mailing list -- [email protected] To unsubscribe send an email to [email protected] https://mail.python.org/mailman3//lists/numpy-discussion.python.org Member address: [email protected]
