[Numpy-discussion] Re: Current policy on AI-generated code in NumPy

sebastian Sun, 15 Feb 2026 08:11:24 -0800

Hey!

thanks a lot for chiming in, still watching the videos!

FWIW, I have come around in that I guess we should add something (a lotof projects

are discussing this, I would be happy to just steal one too).

I would still focus on the transparency and social part (because I thinkthatis the part that affects us more in practice) rather than copyright, butyeah,

I guess it deserves a place.

It may well be nice to force a checkbox and a short note in the PRtemplate, I amnot sure what will work there, but it may also depend a bit on the "riskassessment"

how much detail we need.

Now for copyright issues, I am still a bit unclear what we should askfor beyondtransparency from the contributor (happy to write thatnon-transparency/unclear

provenance is likely to prompt us to just close).

For the maintainer I think the "blast radius" framework could be veryuseful and itmay be nice to flesh it out (I don't think there is anything NumPyspecific about it,

i.e. in my mind (1) is basically examples for the blast radius?).

I think the important part there might be to have a few rough examplesand where theyland (but I think I would find it very hard to do this with anycertainty!).

I like the risk-matrix approach, so I think what this would look like isa risk-matrix:


Risk of
copyright       |       "Blast radius"
infringement    |
                | very small ....
----------------|--------------------------------------
very low        |
low             |
...             |

And what we would need examples that vary on both axes and suggest avery rough lineat which point in the matrix you should start to be very careful and doextra verification

steps (or just close the PR if you don't want to do those).

At least that is how I would like to approach this when in doubt, butbeyond being prettyconfident that we only had pretty safe PRs for now, I am not sure Icould build up such

a matrix myself.

- Sebastian




On 2026-02-15 10:49, Peter Wang via NumPy-Discussion wrote:

Hey everyone,

Sorry to be jumping in so late on this important thread. And thanks to
everyone for the thoughtful discussion. Numpy is such a visible and
important project, I'm sure that what's decided here will have massive
downstream consequences for the rest of the tech world.

I'm happy to see that there seems to be an emerging consensus of
“principles over policing”: responsibility, understanding,
transparency.  One way I think we can make this more concrete by
framing it around *where* in the project a contribution lands, and
what the “blast radius” looks like, in both space and time, if we
later need to debug, rewrite, or (worst case) roll back code for
legal/regulatory reasons.

A few thoughts / straw man of an "AI contribution policy":

(1) Defining “Zones” of the project

We can explicitly acknowledge that different areas of the codebase
that have different tolerance levels for AI-assisted generation.
"Zone" doesn’t have to mean a single folder. It can be:

* a directory tree (e.g., `numpy/core`, `numpy/linalg`, `doc/`,
`tools/`, `benchmarks/`),
* part of the import namespace (public API, internal helpers),
* or a semantic area (core algorithms, ABI-sensitive paths, numerical
stability-critical routines, tutorial content, CI glue, etc.).

For example:

* Inner ring (high scrutiny): core algorithms, numerics,
ABI/API-critical code paths, anything performance-critical or subtle
correctness-wise.
* Middle ring (moderate): tests, refactors, coverage expansion,
internal tooling, build/CI scripts.
* Outer ring (low): examples, tutorials, onboarding docs, “glue”
that’s easy to replace, small utilities, beginner-facing content.

This is explicitly saying that the question of AI isn't a binary
choice, nor e.g. "AI is forbidden in the core". Rather, the closer you
get to code with high blast radius, the more we should demand human
legibility, reviewable provenance, and high confidence in correctness
and licensing posture.

(2) "Blast radius"

When someone asks “should we accept AI-generated code here?”, I
think they have an implicit model about "blast radius" already. We can
render that model explicit with a few dimensions:

* Complexity: Is this code easy to reason about? Does it involve
numerical stability, tricky invariants, edge-case handling, low-level
memory behavior, or algorithmic subtlety?
* Impact & dependency surface: How many downstream things depend on
this? Is it part of public API? Widely imported? Affects core array
semantics? If it changes, do we risk broad downstream breakage?
* Stability & expected lifespan: Is this an area that tends to be
stable for years (core numerics), or something we expect to churn
(docs examples, CI harnesses)? The longer something is expected to
persist, the higher the cost of “oops”.
* Rollback/Replacement cost: If we had to remove it quickly, how
painful would that be? How entangled is it with other code? How hard
is it to recreate by hand?
* Legibility / testability: Can we test it robustly? Can we write
property tests? Are there known oracles? Is it feasible to get strong
confidence quickly?

(3) Transparency: make it concrete, not vague

I think “please be transparent” is right. At a minimum, I think we
need something like a lightweight attestation or affirmation from the
contributor — not a legal affidavit, not a license, not an attempt
to police workflow — but a structured statement that sets reviewer
expectations and creates an audit trail.

Something along these lines (details can be bikeshedded later):
**AI Use Attestation (for PR description template / checkboxes):**

* Did you use an AI tool to generate or substantially modify code in
this PR? (yes/no)
* If yes:
  * which tool/model (and ideally version/date — these change fast),
  * what parts of the PR were AI-assisted (core logic vs tests vs docs
vs refactors),
  * confirm: “I understand this code, I can explain it, and I’m
responsible for it.”

Then scale the requested detail by zone / blast radius:

* Outer/middle ring: model/tool + high-level description is probably
enough.
* Inner ring / high blast radius: I’d like us to consider asking for
more:
  * the prompts (or at least the key prompts) used to generate the
logic,
  * any intermediate artifacts that help future maintainers understand
how we got here (e.g., the “why” behind design choices, variants
considered, constraints given to the model),
  * and ideally a short human-written explanation of the algorithm and
invariants (which is good practice regardless of AI).

I can appreciate that this seems burdensome for small PRs fixing a
tiny thing. But the above is just a straw-man and perhaps there are
some nice simplifications we can engineer to make this as lightweight
a part of the workflow as possible. (Perhaps even a stub
Numpy_ai_contribution_guide.md that gives the code-gen LLM a template
to fill out and include in the PR?)

The analogy I have in mind: treat AI like a nondeterministic *semantic
compiler*. With a normal compiler, we keep intermediate info when we
care: flags, versions, debug symbols, build logs. For high blast
radius code, the prompts and intermediate reasoning are effectively
that metadata. Even if we don’t store everything in-repo, capturing
it in the PR discussion is valuable.  This is literally like
preserving the seed, when we have to check in the output of an RNG.

(4) Why keep prompts / artifacts? (forward-looking CI idea)

One reason I care about preserving the trail: I can imagine a future
“AI regression / reproducibility” check.
Say it’s 6–12 months from now and AI coding tools are even
stronger. If we have prompts and model versions recorded for
high-blast-radius contributions, we could run a periodic (maybe
opt-in) workflow that:

* replays historical prompts against a current/known model
environment,
* compares the generated output structurally (or semantically) to what
we merged,
* and validates that the merged implementation still matches expected
numerical behavior (tests + known benchmarks).

Even if we never automate this, having the trace helps humans debug:
“what constraints were assumed?” “what source did it mirror?”
“what was the intended invariant?”

(5) Copyright / legality

This is the part where I think a little conservatism is justified. The
legal landscape around training data, derived works, and obligations
around GPL/AGPL/LGPL is still evolving across jurisdictions. NumPy is
permissively licensed, but that doesn’t automatically insulate us
from the provenance question if generated code ends up looking like
something from a copyleft codebase.  I can tell you for a fact that
corporate legal compliance folks will not hesitate to use the
ban-hammer if, after some future court case, it's deemed that
codebases like Numpy's are "tainted" and require roll-back.

I’m not proposing we block AI tools categorically (that’s neither
realistic nor enforceable). But I do think it’s reasonable to say:

* in high blast radius zones, contributors should prefer tools with
clearer provenance and license posture, and we should be willing to
ask for extra diligence (explanations, tests, and/or avoiding “model
wrote the entire algorithm” submissions);
* in low blast radius zones, the risk/cost trade is different, and we
can be more permissive.

I also think we should explicitly acknowledge that this policy may
evolve as jurisprudence and tooling clarity improves.

As an additional note, over the last couple of years I have been
actively working on a new "AI Rights" license & tech infrastructure to
help give explicit attestation for authors of all copyrighted works,
along the lines of CC Signals[1] or IETF AI Preferences[2].  I'm
actually sending this from the AI Summit in Delhi where, as part of AI
Commons, I'm convening allies from Creative Commons, Wikimedia,
Internet Archive, Common Crawl, and others to align on shared vision
and workstreams.  Those who are interested in my work on this can see
the videos at links [3][4][5].

If you're to just watch one, I'd recommend [4] then [5], or just [5].

I'm happy to chat in depth with any/all of you about these topics, but
I want to be sensitive about not hijacking the Numpy list for my
personal mad ravings, so we can take them off-list if the maintainers
deem it too off-topic. If, on the other hand, y'all want to have a
dialogue about this stuff here, I can think of no finer group to
pressure-test my ideas. :-)

Cheers,
Peter

(In the spirit of transparency and dogfooding: some parts of this
email came from a thread summarization and initial dialog with GPT 4o
and 5.2, although I only used the output as a starting point and
edited heavily afterwards.)

[1] https://creativecommons.org/ai-and-the-commons/cc-signals/
[2] https://datatracker.ietf.org/wg/aipref/about/
[3] https://www.youtube.com/watch?v=oZHl4NWaO7c
[4] "AI for All": https://www.youtube.com/watch?v=TLZ9zXnluc8
[5] "AI Training & The Data Commons in Crisis":
https://www.youtube.com/watch?v=CdKxgT1o864
_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: [email protected]

_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: [email protected]

[Numpy-discussion] Re: Current policy on AI-generated code in NumPy

Reply via email to