[Numpy-discussion] Re: Current policy on AI-generated code in NumPy

Matthew Brett via NumPy-Discussion Mon, 23 Feb 2026 01:15:12 -0800

Hi,

On Mon, Feb 23, 2026 at 8:53 AM Sebastian Berg
<[email protected]> wrote:
>
> On Sun, 2026-02-22 at 10:19 -0500, Marten van Kerkwijk via NumPy-
> Discussion wrote:
> > Ralf Gommers via NumPy-Discussion <[email protected]>
> > writes:
> >
> > [snip]
> >
> > > I do think a web of trust is a potentially valuable idea. However,
> > > the need right now isn't
> > > there yet (at least for NumPy) and it does have the potential to
> > > close the door pretty strongly
> > > to newcomers. On the other hand, we already don't run CI on PRs
> > > from first-time
> > > contributors - that was something that turned out to be necessary
> > > to limit wasting
> > > resources. A web of trust is something to keep in mind in my
> > > opinion, and consider adopting
> > > if and when it becomes a clear win for maintainer load.
> >
> > Thanks for the reminder that we do not run CI for first-time
> > contributors.  That is nice in that there is already a mechanism in
> > place to recognize those.  As an intermediate step towards trust (but
> > not yet a web of it!), would it make sense to have a welcome message
> > that asks the new contributor to introduce themselves by editing
> > their
> > top comment? I.e., something like this:
>
>
> Thanks for the suggestions on what concrete steps we should do (and I
> agree we should do something).
> I would be fine with basically adopting either of these, adopting the
> SciPy/SymPy one seems pragmatic, it is nice to keep things similar in
> similar projects. SymPy/LLVM do have a pretty clear note on copyright
> (not all do, I think). [1]
> (I like many things about the LLVM, it is nice explicit about
> reasoning, etc. but I guess that also makes it longer.)
>
> To me they honestly all get the important points across. And honestly,
> I suspect many contributors won't read it anyway, so it may be more
> used to point to in the rare case where you close a PR or so.
>
> To achieve better transparency, I would suggest we add check-boxes,
> E.g. sklearn has this now:
>
>     <!--
>     If AI tools were involved in creating this PR, please check all boxes 
> that apply
>     below and make sure that you adhere to our Automated Contributions Policy:
>     
> https://scikit-learn.org/dev/developers/contributing.html#automated-contributions-policy
>     -->
>     I used AI assistance for:
>     - [ ] Code generation (e.g., when writing an implementation or fixing a 
> bug)
>     - [ ] Test/benchmark generation
>     - [ ] Documentation (including examples)
>     - [ ] Research and understanding
>
> I am not sure how well it is used, but I think that is a good start to
> see where it goes. I could imagine trying to put in something about the
> scope of AI use, but I am not sure if it matters. It may be easier to
> just follow up for PRs where it is unclear.
>
> (FWIW, I like the comment asking for a bit of personal context, it
> feels both helpful and welcoming! But I think when it comes to AI
> specifically, I would start with the check-boxes for pragmatism.)


Unfortunately, as Oscar's example showed (and other slop PRs seem to
confirm), it looks as though the check-boxes will be entirely useless,
as the AI is perfectly capable of filling those out for you, and won't
worry (as far as we know) about choosing the result most likely to get
the PR merged.

That in turn has some major costs for maintainer burn-out - as Maarten
and Matt H are pointing out.

I'm increasingly leaning towards - no AI generated code at all, unless
a) from a well-trusted contributor, and b) justified by that
contributor.

> [1] I would be happy with linking out to continuing discussion towards
> the note in the LLVM one: "Artificial intelligence systems raise many
> questions around copyright that have yet to be answered"
> But I think that is about as much as I want to focus on that point in
> something targeted for contributors.

Just flagging - but if we aren't asking the contributor to address
copyright, we have two options:

* The maintainer does it.   I don't think there's any chance that will
happen in practice.
* We effectively decide we aren't going to worry about AI copyright violations.

I realize the second option is the de-facto preference of some here,
but if that's so, I think we have to say that out loud.

Cheers,

Matthew
_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: [email protected]

[Numpy-discussion] Re: Current policy on AI-generated code in NumPy

Reply via email to