[Numpy-discussion] Re: Current policy on AI-generated code in NumPy

Matthew Brett via NumPy-Discussion Wed, 18 Feb 2026 06:23:43 -0800

Hi,

On Sat, Feb 14, 2026 at 5:38 PM Robert Kern <[email protected]> wrote:
>
> On Sat, Feb 14, 2026 at 12:17 PM Matthew Brett <[email protected]> 
> wrote:
>>
>> Hi,
>>
>> On Fri, Feb 13, 2026 at 9:45 PM Robert Kern <[email protected]> wrote:
>> >
>> > On Wed, Feb 11, 2026 at 6:26 PM Matthew Brett via NumPy-Discussion 
>> > <[email protected]> wrote:
>> >>
>> >>
>> >> Just to clarify - in case it wasn't clear, what I'm floating as a 
>> >> proposal, would be something like this, as a message to PR authors:
>> >>
>> >> Please specify one of these:
>> >>
>> >> 1) I wrote this code myself, without looking at significant AI-generated 
>> >> code OR
>> >> 2) The code contains AI-generated content, but the AI-generated code is 
>> >> sufficiently trivial that it cannot reasonably be subject to copyright OR
>> >> 3) There is non-trivial AI-generated code in this PR, and I have 
>> >> documented my searches to confirm that no parts of the code are subject 
>> >> to existing copyright.
>> >>
>> >> So - the burden for the reviewer is just to confirm, in case 3, that the 
>> >> author has documented their searches.   We take the word of the 
>> >> contributor for the option they have chosen.   Obviously, the 
>> >> documentation requirement of case 3 is somewhat of a burden for the 
>> >> contributor, and may therefore encourage them to write the code 
>> >> themselves, to avoid that burden.  That might not be a bad thing, long 
>> >> term, for the project, and it seems reasonable to me as some defence 
>> >> against copyright violation, and a message that the project cares about 
>> >> such violation.
>> >
>> >
>> > For Case 3, I would love to see an example of the search that you would 
>> > accept. If you could take a recent PR (human or AI, doesn't really matter 
>> > for this purpose), and show the search that would satisfy you, that would 
>> > go a long way towards clarifying what you are asking for here. We'd need a 
>> > worked example or two before adopting this policy because if I don't know 
>> > what you are asking for, no new contributor will, either.
>>
>> Yes, that's a reasonable request.   But how do you think I should
>> proceed?   Make an issue on Numpy, and start drafting?   Start another
>> email thread?  Or a Discourse / Scientific Python thread?
>
>
> Just here should be fine. Take an existing PR that has copyrightable content 
> (e.g. an entire new function or three, each more than ~10 lines, not just 
> many one-line updates scattered around; the most interesting ones would be 
> those that implement a known algorithm). Do the code search that would 
> satisfy you. Write out here what you would want a PR author to provide.


I'd suggested (off-list) that this might be better done in another
thread - but perhaps it can be done here.

Reflecting, and experimenting - there are many caveats, but I think it
is reasonable to give the contributor some responsibility for formal
care about copyright.

One way of doing that - is to ask some AI (if possible, an AI other
than the one generating the code) to review for copyright.  I've
experimented with that over at
https://github.com/numpy/numpy/pull/30828#issuecomment-3920553882 .
But the idea would be that we ask a contributor who has generated code
by AI, to do this as part of the PR sign-off.   They should be in a
much better position to do this than the maintainers, as they should
have been exploring the problem themselves, and therefore should be
able to write better queries to guide the AI review.   And with the
prompts as a start, it's not particularly time-consuming.

Cheers,

Matthew
_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: [email protected]

[Numpy-discussion] Re: Current policy on AI-generated code in NumPy

Reply via email to