[Numpy-discussion] Re: Current policy on AI-generated code in NumPy

Peter Wang via NumPy-Discussion Sun, 15 Feb 2026 01:54:10 -0800

Hey everyone,

Sorry to be jumping in so late on this important thread. And thanks to
everyone for the thoughtful discussion. Numpy is such a visible and
important project, I'm sure that what's decided here will have massive
downstream consequences for the rest of the tech world.


I'm happy to see that there seems to be an emerging consensus of
“principles over policing”: responsibility, understanding, transparency.
One way I think we can make this more concrete by framing it around *where*
in the project a contribution lands, and what the “blast radius” looks
like, in both space and time, if we later need to debug, rewrite, or (worst
case) roll back code for legal/regulatory reasons.

A few thoughts / straw man of an "AI contribution policy":

(1) Defining “Zones” of the project

We can explicitly acknowledge that different areas of the codebase that
have different tolerance levels for AI-assisted generation. "Zone" doesn’t
have to mean a single folder. It can be:

* a directory tree (e.g., `numpy/core`, `numpy/linalg`, `doc/`, `tools/`,
`benchmarks/`),
* part of the import namespace (public API, internal helpers),
* or a semantic area (core algorithms, ABI-sensitive paths, numerical
stability-critical routines, tutorial content, CI glue, etc.).

For example:

* Inner ring (high scrutiny): core algorithms, numerics, ABI/API-critical
code paths, anything performance-critical or subtle correctness-wise.
* Middle ring (moderate): tests, refactors, coverage expansion, internal
tooling, build/CI scripts.
* Outer ring (low): examples, tutorials, onboarding docs, “glue” that’s
easy to replace, small utilities, beginner-facing content.

This is explicitly saying that the question of AI isn't a binary choice,
nor e.g. "AI is forbidden in the core". Rather, the closer you get to code
with high blast radius, the more we should demand human legibility,
reviewable provenance, and high confidence in correctness and licensing
posture.

(2) "Blast radius"

When someone asks “should we accept AI-generated code here?”, I think they
have an implicit model about "blast radius" already. We can render that
model explicit with a few dimensions:

* Complexity: Is this code easy to reason about? Does it involve numerical
stability, tricky invariants, edge-case handling, low-level memory
behavior, or algorithmic subtlety?
* Impact & dependency surface: How many downstream things depend on this?
Is it part of public API? Widely imported? Affects core array semantics? If
it changes, do we risk broad downstream breakage?
* Stability & expected lifespan: Is this an area that tends to be stable
for years (core numerics), or something we expect to churn (docs examples,
CI harnesses)? The longer something is expected to persist, the higher the
cost of “oops”.
* Rollback/Replacement cost: If we had to remove it quickly, how painful
would that be? How entangled is it with other code? How hard is it to
recreate by hand?
* Legibility / testability: Can we test it robustly? Can we write property
tests? Are there known oracles? Is it feasible to get strong confidence
quickly?

(3) Transparency: make it concrete, not vague

I think “please be transparent” is right. At a minimum, I think we need
something like a lightweight attestation or affirmation from the
contributor — not a legal affidavit, not a license, not an attempt to
police workflow — but a structured statement that sets reviewer
expectations and creates an audit trail.

Something along these lines (details can be bikeshedded later):
**AI Use Attestation (for PR description template / checkboxes):**

* Did you use an AI tool to generate or substantially modify code in this
PR? (yes/no)
* If yes:
  * which tool/model (and ideally version/date — these change fast),
  * what parts of the PR were AI-assisted (core logic vs tests vs docs vs
refactors),
  * confirm: “I understand this code, I can explain it, and I’m responsible
for it.”

Then scale the requested detail by zone / blast radius:

* Outer/middle ring: model/tool + high-level description is probably enough.
* Inner ring / high blast radius: I’d like us to consider asking for more:
  * the prompts (or at least the key prompts) used to generate the logic,
  * any intermediate artifacts that help future maintainers understand how
we got here (e.g., the “why” behind design choices, variants considered,
constraints given to the model),
  * and ideally a short human-written explanation of the algorithm and
invariants (which is good practice regardless of AI).

I can appreciate that this seems burdensome for small PRs fixing a tiny
thing. But the above is just a straw-man and perhaps there are some nice
simplifications we can engineer to make this as lightweight a part of the
workflow as possible. (Perhaps even a stub Numpy_ai_contribution_guide.md
that gives the code-gen LLM a template to fill out and include in the PR?)

The analogy I have in mind: treat AI like a nondeterministic *semantic
compiler*. With a normal compiler, we keep intermediate info when we care:
flags, versions, debug symbols, build logs. For high blast radius code, the
prompts and intermediate reasoning are effectively that metadata. Even if
we don’t store everything in-repo, capturing it in the PR discussion is
valuable.  This is literally like preserving the seed, when we have to
check in the output of an RNG.

(4) Why keep prompts / artifacts? (forward-looking CI idea)

One reason I care about preserving the trail: I can imagine a future “AI
regression / reproducibility” check.
Say it’s 6–12 months from now and AI coding tools are even stronger. If we
have prompts and model versions recorded for high-blast-radius
contributions, we could run a periodic (maybe opt-in) workflow that:

* replays historical prompts against a current/known model environment,
* compares the generated output structurally (or semantically) to what we
merged,
* and validates that the merged implementation still matches expected
numerical behavior (tests + known benchmarks).

Even if we never automate this, having the trace helps humans debug: “what
constraints were assumed?” “what source did it mirror?” “what was the
intended invariant?”

(5) Copyright / legality

This is the part where I think a little conservatism is justified. The
legal landscape around training data, derived works, and obligations around
GPL/AGPL/LGPL is still evolving across jurisdictions. NumPy is permissively
licensed, but that doesn’t automatically insulate us from the provenance
question if generated code ends up looking like something from a copyleft
codebase.  I can tell you for a fact that corporate legal compliance folks
will not hesitate to use the ban-hammer if, after some future court case,
it's deemed that codebases like Numpy's are "tainted" and require roll-back.

I’m not proposing we block AI tools categorically (that’s neither realistic
nor enforceable). But I do think it’s reasonable to say:

* in high blast radius zones, contributors should prefer tools with clearer
provenance and license posture, and we should be willing to ask for extra
diligence (explanations, tests, and/or avoiding “model wrote the entire
algorithm” submissions);
* in low blast radius zones, the risk/cost trade is different, and we can
be more permissive.

I also think we should explicitly acknowledge that this policy may evolve
as jurisprudence and tooling clarity improves.

As an additional note, over the last couple of years I have been actively
working on a new "AI Rights" license & tech infrastructure to help give
explicit attestation for authors of all copyrighted works, along the lines
of CC Signals[1] or IETF AI Preferences[2].  I'm actually sending this from
the AI Summit in Delhi where, as part of AI Commons, I'm convening allies
from Creative Commons, Wikimedia, Internet Archive, Common Crawl, and
others to align on shared vision and workstreams.  Those who are interested
in my work on this can see the videos at links [3][4][5].

If you're to just watch one, I'd recommend [4] then [5], or just [5].

I'm happy to chat in depth with any/all of you about these topics, but I
want to be sensitive about not hijacking the Numpy list for my personal mad
ravings, so we can take them off-list if the maintainers deem it too
off-topic. If, on the other hand, y'all want to have a dialogue about this
stuff here, I can think of no finer group to pressure-test my ideas. :-)

Cheers,
Peter

(In the spirit of transparency and dogfooding: some parts of this email
came from a thread summarization and initial dialog with GPT 4o and 5.2,
although I only used the output as a starting point and edited heavily
afterwards.)


[1] https://creativecommons.org/ai-and-the-commons/cc-signals/
[2] https://datatracker.ietf.org/wg/aipref/about/
[3] https://www.youtube.com/watch?v=oZHl4NWaO7c
[4] "AI for All": https://www.youtube.com/watch?v=TLZ9zXnluc8
[5] "AI Training & The Data Commons in Crisis":
https://www.youtube.com/watch?v=CdKxgT1o864

_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: [email protected]

[Numpy-discussion] Re: Current policy on AI-generated code in NumPy

Reply via email to