[Numpy-discussion] Re: Current policy on AI-generated code in NumPy

Charles R Harris via NumPy-Discussion Sun, 15 Feb 2026 07:09:34 -0800

On Sun, Feb 15, 2026 at 2:50 AM Peter Wang via NumPy-Discussion <
[email protected]> wrote:


>
> Hey everyone,
>
> Sorry to be jumping in so late on this important thread. And thanks to
> everyone for the thoughtful discussion. Numpy is such a visible and
> important project, I'm sure that what's decided here will have massive
> downstream consequences for the rest of the tech world.
>
> I'm happy to see that there seems to be an emerging consensus of
> “principles over policing”: responsibility, understanding, transparency.
> One way I think we can make this more concrete by framing it around *where*
> in the project a contribution lands, and what the “blast radius” looks
> like, in both space and time, if we later need to debug, rewrite, or (worst
> case) roll back code for legal/regulatory reasons.
>
> A few thoughts / straw man of an "AI contribution policy":
>
> (1) Defining “Zones” of the project
>
> We can explicitly acknowledge that different areas of the codebase that
> have different tolerance levels for AI-assisted generation. "Zone" doesn’t
> have to mean a single folder. It can be:
>
> * a directory tree (e.g., `numpy/core`, `numpy/linalg`, `doc/`, `tools/`,
> `benchmarks/`),
> * part of the import namespace (public API, internal helpers),
> * or a semantic area (core algorithms, ABI-sensitive paths, numerical
> stability-critical routines, tutorial content, CI glue, etc.).
>
> For example:
>
> * Inner ring (high scrutiny): core algorithms, numerics, ABI/API-critical
> code paths, anything performance-critical or subtle correctness-wise.
> * Middle ring (moderate): tests, refactors, coverage expansion, internal
> tooling, build/CI scripts.
> * Outer ring (low): examples, tutorials, onboarding docs, “glue” that’s
> easy to replace, small utilities, beginner-facing content.
>
> This is explicitly saying that the question of AI isn't a binary choice,
> nor e.g. "AI is forbidden in the core". Rather, the closer you get to code
> with high blast radius, the more we should demand human legibility,
> reviewable provenance, and high confidence in correctness and licensing
> posture.
>
> (2) "Blast radius"
>
> When someone asks “should we accept AI-generated code here?”, I think they
> have an implicit model about "blast radius" already. We can render that
> model explicit with a few dimensions:
>
> * Complexity: Is this code easy to reason about? Does it involve numerical
> stability, tricky invariants, edge-case handling, low-level memory
> behavior, or algorithmic subtlety?
> * Impact & dependency surface: How many downstream things depend on this?
> Is it part of public API? Widely imported? Affects core array semantics? If
> it changes, do we risk broad downstream breakage?
> * Stability & expected lifespan: Is this an area that tends to be stable
> for years (core numerics), or something we expect to churn (docs examples,
> CI harnesses)? The longer something is expected to persist, the higher the
> cost of “oops”.
> * Rollback/Replacement cost: If we had to remove it quickly, how painful
> would that be? How entangled is it with other code? How hard is it to
> recreate by hand?
> * Legibility / testability: Can we test it robustly? Can we write property
> tests? Are there known oracles? Is it feasible to get strong confidence
> quickly?
>
> (3) Transparency: make it concrete, not vague
>
> I think “please be transparent” is right. At a minimum, I think we need
> something like a lightweight attestation or affirmation from the
> contributor — not a legal affidavit, not a license, not an attempt to
> police workflow — but a structured statement that sets reviewer
> expectations and creates an audit trail.
>
> Something along these lines (details can be bikeshedded later):
> **AI Use Attestation (for PR description template / checkboxes):**
>
> * Did you use an AI tool to generate or substantially modify code in this
> PR? (yes/no)
> * If yes:
>   * which tool/model (and ideally version/date — these change fast),
>   * what parts of the PR were AI-assisted (core logic vs tests vs docs vs
> refactors),
>   * confirm: “I understand this code, I can explain it, and I’m
> responsible for it.”
>
> Then scale the requested detail by zone / blast radius:
>
> * Outer/middle ring: model/tool + high-level description is probably
> enough.
> * Inner ring / high blast radius: I’d like us to consider asking for more:
>   * the prompts (or at least the key prompts) used to generate the logic,
>   * any intermediate artifacts that help future maintainers understand how
> we got here (e.g., the “why” behind design choices, variants considered,
> constraints given to the model),
>   * and ideally a short human-written explanation of the algorithm and
> invariants (which is good practice regardless of AI).
>
> I can appreciate that this seems burdensome for small PRs fixing a tiny
> thing. But the above is just a straw-man and perhaps there are some nice
> simplifications we can engineer to make this as lightweight a part of the
> workflow as possible. (Perhaps even a stub Numpy_ai_contribution_guide.md
> that gives the code-gen LLM a template to fill out and include in the PR?)
>
> The analogy I have in mind: treat AI like a nondeterministic *semantic
> compiler*. With a normal compiler, we keep intermediate info when we care:
> flags, versions, debug symbols, build logs. For high blast radius code, the
> prompts and intermediate reasoning are effectively that metadata. Even if
> we don’t store everything in-repo, capturing it in the PR discussion is
> valuable.  This is literally like preserving the seed, when we have to
> check in the output of an RNG.
>
> (4) Why keep prompts / artifacts? (forward-looking CI idea)
>
> One reason I care about preserving the trail: I can imagine a future “AI
> regression / reproducibility” check.
> Say it’s 6–12 months from now and AI coding tools are even stronger. If we
> have prompts and model versions recorded for high-blast-radius
> contributions, we could run a periodic (maybe opt-in) workflow that:
>
> * replays historical prompts against a current/known model environment,
> * compares the generated output structurally (or semantically) to what we
> merged,
> * and validates that the merged implementation still matches expected
> numerical behavior (tests + known benchmarks).
>
> Even if we never automate this, having the trace helps humans debug: “what
> constraints were assumed?” “what source did it mirror?” “what was the
> intended invariant?”
>
> (5) Copyright / legality
>
> This is the part where I think a little conservatism is justified. The
> legal landscape around training data, derived works, and obligations around
> GPL/AGPL/LGPL is still evolving across jurisdictions. NumPy is permissively
> licensed, but that doesn’t automatically insulate us from the provenance
> question if generated code ends up looking like something from a copyleft
> codebase.  I can tell you for a fact that corporate legal compliance folks
> will not hesitate to use the ban-hammer if, after some future court case,
> it's deemed that codebases like Numpy's are "tainted" and require roll-back.
>
> I’m not proposing we block AI tools categorically (that’s neither
> realistic nor enforceable). But I do think it’s reasonable to say:
>
> * in high blast radius zones, contributors should prefer tools with
> clearer provenance and license posture, and we should be willing to ask for
> extra diligence (explanations, tests, and/or avoiding “model wrote the
> entire algorithm” submissions);
> * in low blast radius zones, the risk/cost trade is different, and we can
> be more permissive.
>
> I also think we should explicitly acknowledge that this policy may evolve
> as jurisprudence and tooling clarity improves.
>
> As an additional note, over the last couple of years I have been actively
> working on a new "AI Rights" license & tech infrastructure to help give
> explicit attestation for authors of all copyrighted works, along the lines
> of CC Signals[1] or IETF AI Preferences[2].  I'm actually sending this from
> the AI Summit in Delhi where, as part of AI Commons, I'm convening allies
> from Creative Commons, Wikimedia, Internet Archive, Common Crawl, and
> others to align on shared vision and workstreams.  Those who are interested
> in my work on this can see the videos at links [3][4][5].
>
> If you're to just watch one, I'd recommend [4] then [5], or just [5].
>
> I'm happy to chat in depth with any/all of you about these topics, but I
> want to be sensitive about not hijacking the Numpy list for my personal mad
> ravings, so we can take them off-list if the maintainers deem it too
> off-topic. If, on the other hand, y'all want to have a dialogue about this
> stuff here, I can think of no finer group to pressure-test my ideas. :-)
>
> Cheers,
> Peter
>
> (In the spirit of transparency and dogfooding: some parts of this email
> came from a thread summarization and initial dialog with GPT 4o and 5.2,
> although I only used the output as a starting point and edited heavily
> afterwards.)
>
>
> [1] https://creativecommons.org/ai-and-the-commons/cc-signals/
> [2] https://datatracker.ietf.org/wg/aipref/about/
> [3] https://www.youtube.com/watch?v=oZHl4NWaO7c
> [4] "AI for All": https://www.youtube.com/watch?v=TLZ9zXnluc8
> [5] "AI Training & The Data Commons in Crisis":
> https://www.youtube.com/watch?v=CdKxgT1o864
>
>
A prompt template the people can use with their code generation might be
helpful. As an example of such:
https://x.com/WEschenbach/status/2022189308065796295. Mathew Rocklin had
something similar in his discussion of hooks.
The idea is to find ways to avoid some problems up front.

Chuck

_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: [email protected]

[Numpy-discussion] Re: Current policy on AI-generated code in NumPy

Reply via email to