[Numpy-discussion] Re: Current policy on AI-generated code in NumPy

Sebastian Berg Sat, 14 Feb 2026 01:03:50 -0800

On Sat, 2026-02-14 at 08:35 +0200, matti picus via NumPy-Discussion
wrote:
> On Fri, 13 Feb 2026 at 23:51, Robert Kern via NumPy-Discussion <
> [email protected]> wrote:
> 
> > On Wed, Feb 11, 2026 at 6:26 PM Matthew Brett via NumPy-Discussion
> > <
> > [email protected]> wrote:
> > 
> > > 
> > > Just to clarify - in case it wasn't clear, what I'm floating as a
> > > proposal, would be something like this, as a message to PR
> > > authors:
> > > 
> > > Please specify one of these:
> > > 
> > > 1) I wrote this code myself, without looking at significant AI-
> > > generated
> > > code OR
> > > 2) The code contains AI-generated content, but the AI-generated
> > > code is
> > > sufficiently trivial that it cannot reasonably be subject to
> > > copyright OR
> > > 3) There is non-trivial AI-generated code in this PR, and I have
> > > documented my searches to confirm that no parts of the code are
> > > subject to
> > > existing copyright.
> > > 
> > > So - the burden for the reviewer is just to confirm, in case 3,
> > > that the
> > > author has documented their searches.   We take the word of the
> > > contributor
> > > for the option they have chosen.   Obviously, the documentation
> > > requirement
> > > of case 3 is somewhat of a burden for the contributor, and may
> > > therefore
> > > encourage them to write the code themselves, to avoid that
> > > burden.  That
> > > might not be a bad thing, long term, for the project, and it
> > > seems
> > > reasonable to me as some defence against copyright violation, and
> > > a message
> > > that the project cares about such violation.
> > > 
> > 
> > For Case 3, I would love to see an example of the search that you
> > would
> > accept. If you could take a recent PR (human or AI, doesn't really
> > matter
> > for this purpose), and show the search that would satisfy you, that
> > would
> > go a long way towards clarifying what you are asking for here. We'd
> > need a
> > worked example or two before adopting this policy because if I
> > don't know
> > what you are asking for, no new contributor will, either.
> > 
> > --
> > Robert Kern
> > _______________________________________________
> > NumPy-Discussion mailing list -- [email protected]
> > To unsubscribe send an email to [email protected]
> > https://mail.python.org/mailman3//lists/numpy-discussion.python.org
> > Member address: [email protected]
> 
> 
> Here is an example right now in NumPy. Apparently someone is deep
> diving
> into performance edge cases. They (most likely with the help of ai or
> totally by ai) submitted a three line PR
> https://github.com/numpy/numpy/pull/30810 to speed up np.array_equal.
> Now
> the same author submitted a much bigger PR to speed up np.isin
> https://github.com/numpy/numpy/pull/30828. Is the work the product of
> ai?
> Yes, but the author claims to have verified the code. Is the author
> ai or
> not? Should we proceed with the PR?



I suppose my opinion for now is: If you as a maintainer care/want to.
And that part I would be happy to put into a policy if need be (which
can/should mention more things!).

The below got much longer... need to read: What Marten said ;).


In practice the issue I have with this type of PR isn't much about
copyright or that it is possible that almost all the work was using an
AI.
(Not because copyright it isn't an issue, I just don't think there were
PRs of a kind where I would be seriously worried about it.)

It is really about the social dynamic and if a policy can help with
that, I am all for it.
Before, we had at least one of three intrinsically motivating reasons
to look at a PR/issue:
* We knew the submitter cares about seeing the feature (i.e. the result
  not for contributions sake).
* It is just for contributions sake, but we are investing in community.
  I.e. we like helping!
* Or I happen to care about it myself. (That could be scratching my
  my own itch or thinking it is important for the project.)

With the old waves of PRs from students, hacktoberfest, ... you pick
one. We had the community investment/interaction point applying in some
form and adding some motivation.
With the current wave I think an issue is that more often it leaves the
maintainer without _any_ of those motivational points applying -- I am
not even sure that the wave is bigger yet (but it is probably more a
swelling).


This actually started with issues, I think? My feeling is we have more
tiny issues (CuPy is a better example than NumPy here).
Issues that seem like some tool found them. They are often long and
verbose and at the end maybe a PR even gets merged, but at the end I
can't help but think: Well, we just fixed an issue that possibly zero
people in the world care about seeing fixed!

Don't get me wrong, they are real issues and PRs! I like having extra
context for motivation [1], and I think we may need to manage them more
(and that may be putting up a policy to discourage or point to when
closing).

- Sebastian


[1] Also if a human creates an issue, I think it is nice to have the
note on "this crashed my hour long job". vs. "my funny advent of code
solution started failing" (a real regression btw. that I closed because
it was a caused by a fix.)


> Matti
> _______________________________________________
> NumPy-Discussion mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> https://mail.python.org/mailman3//lists/numpy-discussion.python.org
> Member address: [email protected]
_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: [email protected]

[Numpy-discussion] Re: Current policy on AI-generated code in NumPy

Reply via email to