[Numpy-discussion] Re: Current policy on AI-generated code in NumPy

Matthew Brett via NumPy-Discussion Wed, 11 Feb 2026 15:28:29 -0800

Hi,

On Wed, Feb 11, 2026 at 11:02 PM Lucas Colley via NumPy-Discussion <
[email protected]> wrote:


> Hi Matthew,
>
> That all sounds reasonable to me so far, but what are the next steps?*
>
> > put a heavy requirement on contributors to either a) write the code
> themselves, perhaps having asked for preliminary analysis (but not
> substantial code drafts) from AI
>
> Is this enforceable to a significant extent? If not, in what sense could
> it pose a genuinely ‘heavy requirement’?
>
> > or b) write the code with AI, but demonstrate that they have done the
> research to establish the generated code does not breach copyright.
>
> Perhaps this is more enforceable? But to be honest it is still quite
> unclear to me how I would establish with certainty that code I’ve had
> generated does not breach copyright, much less code that is being presented
> to me by a contributor. Do you see how to realise a ‘heavy requirement’
> here?
>
> I agree with the spirit of the thought that the burden (if it is to exist)
> needs to be shifted away from maintainers, but it’s unclear to me how we
> can actually shift it elsewhere.
>
> As we discussed last year, I think we have a start at a decent argument
> towards including a checkbox in PR templates which contributors must tick
> to state that they recognise the risk of copyright violation via LLM
> generated code and take responsibility for the code they are submitting:
> https://github.com/matthew-brett/sp-ai-post/issues/2#issuecomment-2935428854
> .
>
> Even there though, there are still multiple debatable premises. Of course,
> we can hardly aim for some sort of logical proof of the right way forward,
> but I think we need more focused attention and argument towards a specific
> and understandable goal if we are to be able to come to consensus on some
> concrete steps forward. It is to this thread's merit that the discussion
> has been so varied and touched on many topics, but it is also demonstrative
> of the problem that broad and vague back-and-forths don’t really help
> settle on anything concrete.
>

Just to clarify - in case it wasn't clear, what I'm floating as a proposal,
would be something like this, as a message to PR authors:

Please specify one of these:

1) I wrote this code myself, without looking at significant AI-generated
code OR
2) The code contains AI-generated content, but the AI-generated code is
sufficiently trivial that it cannot reasonably be subject to copyright OR
3) There is non-trivial AI-generated code in this PR, and I have documented
my searches to confirm that no parts of the code are subject to existing
copyright.

So - the burden for the reviewer is just to confirm, in case 3, that the
author has documented their searches.   We take the word of the contributor
for the option they have chosen.   Obviously, the documentation requirement
of case 3 is somewhat of a burden for the contributor, and may therefore
encourage them to write the code themselves, to avoid that burden.  That
might not be a bad thing, long term, for the project, and it seems
reasonable to me as some defence against copyright violation, and a message
that the project cares about such violation.

Cheers,

Matthew

_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: [email protected]

[Numpy-discussion] Re: Current policy on AI-generated code in NumPy

Reply via email to