[Numpy-discussion] Re: Current policy on AI-generated code in NumPy

Matthew Brett via NumPy-Discussion Sat, 14 Feb 2026 02:01:08 -0800

Hi Sebastian,

On Sat, Feb 14, 2026 at 9:00 AM Sebastian Berg
<[email protected]> wrote:
>
> On Sat, 2026-02-14 at 08:35 +0200, matti picus via NumPy-Discussion
> wrote:
> > On Fri, 13 Feb 2026 at 23:51, Robert Kern via NumPy-Discussion <
> > [email protected]> wrote:
> >
> > > On Wed, Feb 11, 2026 at 6:26 PM Matthew Brett via NumPy-Discussion
> > > <
> > > [email protected]> wrote:
> > >
> > > >
> > > > Just to clarify - in case it wasn't clear, what I'm floating as a
> > > > proposal, would be something like this, as a message to PR
> > > > authors:
> > > >
> > > > Please specify one of these:
> > > >
> > > > 1) I wrote this code myself, without looking at significant AI-
> > > > generated
> > > > code OR
> > > > 2) The code contains AI-generated content, but the AI-generated
> > > > code is
> > > > sufficiently trivial that it cannot reasonably be subject to
> > > > copyright OR
> > > > 3) There is non-trivial AI-generated code in this PR, and I have
> > > > documented my searches to confirm that no parts of the code are
> > > > subject to
> > > > existing copyright.
> > > >
> > > > So - the burden for the reviewer is just to confirm, in case 3,
> > > > that the
> > > > author has documented their searches.   We take the word of the
> > > > contributor
> > > > for the option they have chosen.   Obviously, the documentation
> > > > requirement
> > > > of case 3 is somewhat of a burden for the contributor, and may
> > > > therefore
> > > > encourage them to write the code themselves, to avoid that
> > > > burden.  That
> > > > might not be a bad thing, long term, for the project, and it
> > > > seems
> > > > reasonable to me as some defence against copyright violation, and
> > > > a message
> > > > that the project cares about such violation.
> > > >
> > >
> > > For Case 3, I would love to see an example of the search that you
> > > would
> > > accept. If you could take a recent PR (human or AI, doesn't really
> > > matter
> > > for this purpose), and show the search that would satisfy you, that
> > > would
> > > go a long way towards clarifying what you are asking for here. We'd
> > > need a
> > > worked example or two before adopting this policy because if I
> > > don't know
> > > what you are asking for, no new contributor will, either.
> > >
> > > --
> > > Robert Kern
> > > _______________________________________________
> > > NumPy-Discussion mailing list -- [email protected]
> > > To unsubscribe send an email to [email protected]
> > > https://mail.python.org/mailman3//lists/numpy-discussion.python.org
> > > Member address: [email protected]
> >
> >
> > Here is an example right now in NumPy. Apparently someone is deep
> > diving
> > into performance edge cases. They (most likely with the help of ai or
> > totally by ai) submitted a three line PR
> > https://github.com/numpy/numpy/pull/30810 to speed up np.array_equal.
> > Now
> > the same author submitted a much bigger PR to speed up np.isin
> > https://github.com/numpy/numpy/pull/30828. Is the work the product of
> > ai?
> > Yes, but the author claims to have verified the code. Is the author
> > ai or
> > not? Should we proceed with the PR?
>
>
> I suppose my opinion for now is: If you as a maintainer care/want to.
> And that part I would be happy to put into a policy if need be (which
> can/should mention more things!).
>
> The below got much longer... need to read: What Marten said ;).
>
>
> In practice the issue I have with this type of PR isn't much about
> copyright or that it is possible that almost all the work was using an
> AI.
> (Not because copyright it isn't an issue, I just don't think there were
> PRs of a kind where I would be seriously worried about it.)
>
> It is really about the social dynamic and if a policy can help with
> that, I am all for it.
> Before, we had at least one of three intrinsically motivating reasons
> to look at a PR/issue:
> * We knew the submitter cares about seeing the feature (i.e. the result
>   not for contributions sake).
> * It is just for contributions sake, but we are investing in community.
>   I.e. we like helping!
> * Or I happen to care about it myself. (That could be scratching my
>   my own itch or thinking it is important for the project.)
>
> With the old waves of PRs from students, hacktoberfest, ... you pick
> one. We had the community investment/interaction point applying in some
> form and adding some motivation.
> With the current wave I think an issue is that more often it leaves the
> maintainer without _any_ of those motivational points applying -- I am
> not even sure that the wave is bigger yet (but it is probably more a
> swelling).
>
>
> This actually started with issues, I think? My feeling is we have more
> tiny issues (CuPy is a better example than NumPy here).
> Issues that seem like some tool found them. They are often long and
> verbose and at the end maybe a PR even gets merged, but at the end I
> can't help but think: Well, we just fixed an issue that possibly zero
> people in the world care about seeing fixed!
>
> Don't get me wrong, they are real issues and PRs! I like having extra
> context for motivation [1], and I think we may need to manage them more
> (and that may be putting up a policy to discourage or point to when
> closing).


I could not agree more - and Stefan's blog post makes a similar - and
very good - argument,

Cheers,

Matthew
_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: [email protected]

[Numpy-discussion] Re: Current policy on AI-generated code in NumPy

Reply via email to