Hey everyone, Sorry to be jumping in so late on this important thread. And thanks to everyone for the thoughtful discussion. Numpy is such a visible and important project, I'm sure that what's decided here will have massive downstream consequences for the rest of the tech world.
I'm happy to see that there seems to be an emerging consensus of “principles over policing”: responsibility, understanding, transparency. One way I think we can make this more concrete by framing it around *where* in the project a contribution lands, and what the “blast radius” looks like, in both space and time, if we later need to debug, rewrite, or (worst case) roll back code for legal/regulatory reasons. A few thoughts / straw man of an "AI contribution policy": (1) Defining “Zones” of the project We can explicitly acknowledge that different areas of the codebase that have different tolerance levels for AI-assisted generation. "Zone" doesn’t have to mean a single folder. It can be: * a directory tree (e.g., `numpy/core`, `numpy/linalg`, `doc/`, `tools/`, `benchmarks/`), * part of the import namespace (public API, internal helpers), * or a semantic area (core algorithms, ABI-sensitive paths, numerical stability-critical routines, tutorial content, CI glue, etc.). For example: * Inner ring (high scrutiny): core algorithms, numerics, ABI/API-critical code paths, anything performance-critical or subtle correctness-wise. * Middle ring (moderate): tests, refactors, coverage expansion, internal tooling, build/CI scripts. * Outer ring (low): examples, tutorials, onboarding docs, “glue” that’s easy to replace, small utilities, beginner-facing content. This is explicitly saying that the question of AI isn't a binary choice, nor e.g. "AI is forbidden in the core". Rather, the closer you get to code with high blast radius, the more we should demand human legibility, reviewable provenance, and high confidence in correctness and licensing posture. (2) "Blast radius" When someone asks “should we accept AI-generated code here?”, I think they have an implicit model about "blast radius" already. We can render that model explicit with a few dimensions: * Complexity: Is this code easy to reason about? Does it involve numerical stability, tricky invariants, edge-case handling, low-level memory behavior, or algorithmic subtlety? * Impact & dependency surface: How many downstream things depend on this? Is it part of public API? Widely imported? Affects core array semantics? If it changes, do we risk broad downstream breakage? * Stability & expected lifespan: Is this an area that tends to be stable for years (core numerics), or something we expect to churn (docs examples, CI harnesses)? The longer something is expected to persist, the higher the cost of “oops”. * Rollback/Replacement cost: If we had to remove it quickly, how painful would that be? How entangled is it with other code? How hard is it to recreate by hand? * Legibility / testability: Can we test it robustly? Can we write property tests? Are there known oracles? Is it feasible to get strong confidence quickly? (3) Transparency: make it concrete, not vague I think “please be transparent” is right. At a minimum, I think we need something like a lightweight attestation or affirmation from the contributor — not a legal affidavit, not a license, not an attempt to police workflow — but a structured statement that sets reviewer expectations and creates an audit trail. Something along these lines (details can be bikeshedded later): **AI Use Attestation (for PR description template / checkboxes):** * Did you use an AI tool to generate or substantially modify code in this PR? (yes/no) * If yes: * which tool/model (and ideally version/date — these change fast), * what parts of the PR were AI-assisted (core logic vs tests vs docs vs refactors), * confirm: “I understand this code, I can explain it, and I’m responsible for it.” Then scale the requested detail by zone / blast radius: * Outer/middle ring: model/tool + high-level description is probably enough. * Inner ring / high blast radius: I’d like us to consider asking for more: * the prompts (or at least the key prompts) used to generate the logic, * any intermediate artifacts that help future maintainers understand how we got here (e.g., the “why” behind design choices, variants considered, constraints given to the model), * and ideally a short human-written explanation of the algorithm and invariants (which is good practice regardless of AI). I can appreciate that this seems burdensome for small PRs fixing a tiny thing. But the above is just a straw-man and perhaps there are some nice simplifications we can engineer to make this as lightweight a part of the workflow as possible. (Perhaps even a stub Numpy_ai_contribution_guide.md that gives the code-gen LLM a template to fill out and include in the PR?) The analogy I have in mind: treat AI like a nondeterministic *semantic compiler*. With a normal compiler, we keep intermediate info when we care: flags, versions, debug symbols, build logs. For high blast radius code, the prompts and intermediate reasoning are effectively that metadata. Even if we don’t store everything in-repo, capturing it in the PR discussion is valuable. This is literally like preserving the seed, when we have to check in the output of an RNG. (4) Why keep prompts / artifacts? (forward-looking CI idea) One reason I care about preserving the trail: I can imagine a future “AI regression / reproducibility” check. Say it’s 6–12 months from now and AI coding tools are even stronger. If we have prompts and model versions recorded for high-blast-radius contributions, we could run a periodic (maybe opt-in) workflow that: * replays historical prompts against a current/known model environment, * compares the generated output structurally (or semantically) to what we merged, * and validates that the merged implementation still matches expected numerical behavior (tests + known benchmarks). Even if we never automate this, having the trace helps humans debug: “what constraints were assumed?” “what source did it mirror?” “what was the intended invariant?” (5) Copyright / legality This is the part where I think a little conservatism is justified. The legal landscape around training data, derived works, and obligations around GPL/AGPL/LGPL is still evolving across jurisdictions. NumPy is permissively licensed, but that doesn’t automatically insulate us from the provenance question if generated code ends up looking like something from a copyleft codebase. I can tell you for a fact that corporate legal compliance folks will not hesitate to use the ban-hammer if, after some future court case, it's deemed that codebases like Numpy's are "tainted" and require roll-back. I’m not proposing we block AI tools categorically (that’s neither realistic nor enforceable). But I do think it’s reasonable to say: * in high blast radius zones, contributors should prefer tools with clearer provenance and license posture, and we should be willing to ask for extra diligence (explanations, tests, and/or avoiding “model wrote the entire algorithm” submissions); * in low blast radius zones, the risk/cost trade is different, and we can be more permissive. I also think we should explicitly acknowledge that this policy may evolve as jurisprudence and tooling clarity improves. As an additional note, over the last couple of years I have been actively working on a new "AI Rights" license & tech infrastructure to help give explicit attestation for authors of all copyrighted works, along the lines of CC Signals[1] or IETF AI Preferences[2]. I'm actually sending this from the AI Summit in Delhi where, as part of AI Commons, I'm convening allies from Creative Commons, Wikimedia, Internet Archive, Common Crawl, and others to align on shared vision and workstreams. Those who are interested in my work on this can see the videos at links [3][4][5]. If you're to just watch one, I'd recommend [4] then [5], or just [5]. I'm happy to chat in depth with any/all of you about these topics, but I want to be sensitive about not hijacking the Numpy list for my personal mad ravings, so we can take them off-list if the maintainers deem it too off-topic. If, on the other hand, y'all want to have a dialogue about this stuff here, I can think of no finer group to pressure-test my ideas. :-) Cheers, Peter (In the spirit of transparency and dogfooding: some parts of this email came from a thread summarization and initial dialog with GPT 4o and 5.2, although I only used the output as a starting point and edited heavily afterwards.) [1] https://creativecommons.org/ai-and-the-commons/cc-signals/ [2] https://datatracker.ietf.org/wg/aipref/about/ [3] https://www.youtube.com/watch?v=oZHl4NWaO7c [4] "AI for All": https://www.youtube.com/watch?v=TLZ9zXnluc8 [5] "AI Training & The Data Commons in Crisis": https://www.youtube.com/watch?v=CdKxgT1o864
_______________________________________________ NumPy-Discussion mailing list -- [email protected] To unsubscribe send an email to [email protected] https://mail.python.org/mailman3//lists/numpy-discussion.python.org Member address: [email protected]
