[Numpy-discussion] Re: Current policy on AI-generated code in NumPy

Matthew Brett via NumPy-Discussion Wed, 11 Feb 2026 13:20:43 -0800

Hi Sebastian,

On Wed, Feb 11, 2026 at 3:49 PM Benjamin Root via NumPy-Discussion <
[email protected]> wrote:

> Just a heads-up, AI Agents are now shame-posting for getting their PR
> closed. Just happened this morning in matplotlib.
>
> On Wed, Feb 11, 2026 at 4:34 AM Sebastian Berg
> <[email protected]> wrote:
> >
> > On Tue, 2026-02-10 at 16:18 -0800, Stefan van der Walt via NumPy-
> > Discussion wrote:
> > > On Tue, Feb 10, 2026, at 15:12, Evgeni Burovski wrote:
> > > >
> > > > > 3. be careful not to breach any copyright or license terms (yes,
> > > > > we
> > > > > take those seriously!).
> > > > >
> > > >
> > > > For a contributor this recommendation is not easily actionable. "I
> > > > used a tool X and it gave me this code" --- how to make sure I
> > > > understand the code, this is clear yes I can do that; how am I
> > > > meant to carefully check for copyright?
> > >
> > > It's near impossible, so I suspect the only way to truly play it safe
> > > is to only provide code that cannot reasonably be copyrighted.
> >
> >
> > TL;DR: To "be careful not to break copyright" just states fact?  How
> > scary that fact is depends a bit on how the viewpoint/how likely it is
> > agents violate copyright.
> > If there is guidance e.g. from some large OSS foundation, I would
> > prefer to link to that rather than try to figure it out ourselves...
> >
> > ---
> >
> > Copyright violation is a problem. But I am not sure it is a huge one
> > for many contributions? I.e. just because they are very project
> > specific or small. [1]
> >
> > However, I still think that this isn't new at all:  By contributing, we
> > already agree to licensing the code with the projects license and that
> > means being sure we are allowed to license it that way.
> > And while we don't make you sign a CLA (contributors license agreement)
> > any project that has a bit of legalese around should already have more
> > scary sentences.
>
>
Yes, it's true that the legal and ethical problem hasn't changed, but the
practical problem has changed out of all recognition.

Before, there was no reasonable likelihood of copyright becoming
effectively void, so there is no practical way of defending your work from
copying.

Now, that is a real possibility, if we do not pay close attention, very
soon.

The difference is two-fold.  Imagine a substantial PR with say 200 lines of
new code, written by AI and checked for logic by the submitter.

a) There's a reasonable chance that the generated code will have pulled
code to which copyright applies, without the submitter realizing that has
happened (this could not have happened before), or

b) It is now completely trivial, by simple inattention, or momentary breach
of ethics, to rewrite copyrightable code.   (This was what I was getting at
with my earlier post on this). The complete triviality matters because,
when it was not trivial, it was much harder to do this by accident, or by
momentary breach.

By analogy - consider students cheating on assignments (of which I have
some experience).   Before, it was possible to cheat, for example, by
paying an essay mill - but it took enough effort to force the student to
consider what they were doing.   As a result they did it rarely, and it was
not common practice.  After AI, it is not only possible, but trivial to
cheat - and indeed, for that reason, cheating with AI has become completely
routine.   For a discussion of the ineffectiveness of teaching
good-practice with AI, to prevent bad practice, see:

https://timrequarth.substack.com/p/why-ai-guidelines-arent-enough

> > So yeah, the scariness of the sentence depends on the view-point, but
> > at its core, I think it just states a fact?
> >
> > For myself, I don't really feel like discussing it too much without a
> > better foundation: it seems to me that books will be written or at
> > least some OSS foundation with more legal knowledge should make
> > guidelines that we can use as a basis of our own (or as a basis of
> > discussion).
>

I think the legal aspect of this is more or less irrelevant for us - I
guess the chances of anyone pursuing us for copyright breach are very
small.   The key issue - at least for me - is the ethical one - are we
honoring the intention of the original author in requesting recognition for
their work?   And that is something it seems to me that we - the Scientific
Python community - are qualified to comment and decide on.

> > Maybe those already exist? Is there an OOS foundation that e.g. says:
> > Please don't use these tools due to copyright issues (or a variation)?
> >
> > You can argue we should inform contributors to err on the super safe
> > side... my gut feeling is we can't do much: Discouraging the careful
> > ones while the non-careful ones don't read this anyway seems not super
> > useful.
>

Paul Ivanov and I discussed this in some detail in our draft blog post,
largely as a result of debate in the Scientific Python meeting in Seattle:

https://github.com/matthew-brett/sp-ai-post/blob/main/notes.md

Summary: it seemed likely to us that establishing a strong norm would in
fact be effective in reducing copyright-damaging use of AI.

> > We could force people to "sign" a CLA now if we were more worried, but
> > do we really want that (nor do I doubt it will help a lot)? [2]
> >
> > FWIW, if someone contributed a non-trivial/textbook algorithm or said
> > "implement X for/in Y", I think they clearly have to do due diligence.
> > (Of course best case, the original code is licensed in a way that
> > derived works -- with attribution -- are unproblematic.)
>

We discussed this case in the blog post (link above).  There is no way to
make sure that the AI will not in fact pull in other, copyrightable code,
even if you asked it to port code for which you know the copyright.

I think the safe - and possibly the best - way to do this - is to put a
heavy requirement on contributors to either a) write the code themselves,
perhaps having asked for preliminary analysis (but not substantial code
drafts) from AI or b) write the code with AI, but demonstrate that they
have done the research to establish the generated code does not breach
copyright.  Yes, that's a burden for the contributor, and yes, we may
therefore lose substantial AI-generated chunks of code, but I suspect we
(Scientific Python projects generally) won't suffer all that much from that
restriction, in the long term - because we gain instead by having
contributors with greater understanding of the code base and their own PRs.
  That is - as Linus Torvalds seems to imply - don't write code with AI,
but use AI for analysis, maintenance and tooling.

Evgeni and I discussed the constraint of pushing copyright burden to
submitters over at:

https://discuss.scientific-python.org/t/a-policy-on-generative-ai-assisted-contributions/1702/27

Cheers,

Matthew

-- 
This email is fully human-source.    Unless I'm quoting AI, I did not use
AI for any text in this email.

_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: [email protected]

[Numpy-discussion] Re: Current policy on AI-generated code in NumPy

Reply via email to