Hi Sebastian, On Wed, Feb 11, 2026 at 3:49 PM Benjamin Root via NumPy-Discussion < [email protected]> wrote:
> Just a heads-up, AI Agents are now shame-posting for getting their PR > closed. Just happened this morning in matplotlib. > > On Wed, Feb 11, 2026 at 4:34 AM Sebastian Berg > <[email protected]> wrote: > > > > On Tue, 2026-02-10 at 16:18 -0800, Stefan van der Walt via NumPy- > > Discussion wrote: > > > On Tue, Feb 10, 2026, at 15:12, Evgeni Burovski wrote: > > > > > > > > > 3. be careful not to breach any copyright or license terms (yes, > > > > > we > > > > > take those seriously!). > > > > > > > > > > > > > For a contributor this recommendation is not easily actionable. "I > > > > used a tool X and it gave me this code" --- how to make sure I > > > > understand the code, this is clear yes I can do that; how am I > > > > meant to carefully check for copyright? > > > > > > It's near impossible, so I suspect the only way to truly play it safe > > > is to only provide code that cannot reasonably be copyrighted. > > > > > > TL;DR: To "be careful not to break copyright" just states fact? How > > scary that fact is depends a bit on how the viewpoint/how likely it is > > agents violate copyright. > > If there is guidance e.g. from some large OSS foundation, I would > > prefer to link to that rather than try to figure it out ourselves... > > > > --- > > > > Copyright violation is a problem. But I am not sure it is a huge one > > for many contributions? I.e. just because they are very project > > specific or small. [1] > > > > However, I still think that this isn't new at all: By contributing, we > > already agree to licensing the code with the projects license and that > > means being sure we are allowed to license it that way. > > And while we don't make you sign a CLA (contributors license agreement) > > any project that has a bit of legalese around should already have more > > scary sentences. > > Yes, it's true that the legal and ethical problem hasn't changed, but the practical problem has changed out of all recognition. Before, there was no reasonable likelihood of copyright becoming effectively void, so there is no practical way of defending your work from copying. Now, that is a real possibility, if we do not pay close attention, very soon. The difference is two-fold. Imagine a substantial PR with say 200 lines of new code, written by AI and checked for logic by the submitter. a) There's a reasonable chance that the generated code will have pulled code to which copyright applies, without the submitter realizing that has happened (this could not have happened before), or b) It is now completely trivial, by simple inattention, or momentary breach of ethics, to rewrite copyrightable code. (This was what I was getting at with my earlier post on this). The complete triviality matters because, when it was not trivial, it was much harder to do this by accident, or by momentary breach. By analogy - consider students cheating on assignments (of which I have some experience). Before, it was possible to cheat, for example, by paying an essay mill - but it took enough effort to force the student to consider what they were doing. As a result they did it rarely, and it was not common practice. After AI, it is not only possible, but trivial to cheat - and indeed, for that reason, cheating with AI has become completely routine. For a discussion of the ineffectiveness of teaching good-practice with AI, to prevent bad practice, see: https://timrequarth.substack.com/p/why-ai-guidelines-arent-enough > > So yeah, the scariness of the sentence depends on the view-point, but > > at its core, I think it just states a fact? > > > > For myself, I don't really feel like discussing it too much without a > > better foundation: it seems to me that books will be written or at > > least some OSS foundation with more legal knowledge should make > > guidelines that we can use as a basis of our own (or as a basis of > > discussion). > I think the legal aspect of this is more or less irrelevant for us - I guess the chances of anyone pursuing us for copyright breach are very small. The key issue - at least for me - is the ethical one - are we honoring the intention of the original author in requesting recognition for their work? And that is something it seems to me that we - the Scientific Python community - are qualified to comment and decide on. > > Maybe those already exist? Is there an OOS foundation that e.g. says: > > Please don't use these tools due to copyright issues (or a variation)? > > > > You can argue we should inform contributors to err on the super safe > > side... my gut feeling is we can't do much: Discouraging the careful > > ones while the non-careful ones don't read this anyway seems not super > > useful. > Paul Ivanov and I discussed this in some detail in our draft blog post, largely as a result of debate in the Scientific Python meeting in Seattle: https://github.com/matthew-brett/sp-ai-post/blob/main/notes.md Summary: it seemed likely to us that establishing a strong norm would in fact be effective in reducing copyright-damaging use of AI. > > We could force people to "sign" a CLA now if we were more worried, but > > do we really want that (nor do I doubt it will help a lot)? [2] > > > > FWIW, if someone contributed a non-trivial/textbook algorithm or said > > "implement X for/in Y", I think they clearly have to do due diligence. > > (Of course best case, the original code is licensed in a way that > > derived works -- with attribution -- are unproblematic.) > We discussed this case in the blog post (link above). There is no way to make sure that the AI will not in fact pull in other, copyrightable code, even if you asked it to port code for which you know the copyright. I think the safe - and possibly the best - way to do this - is to put a heavy requirement on contributors to either a) write the code themselves, perhaps having asked for preliminary analysis (but not substantial code drafts) from AI or b) write the code with AI, but demonstrate that they have done the research to establish the generated code does not breach copyright. Yes, that's a burden for the contributor, and yes, we may therefore lose substantial AI-generated chunks of code, but I suspect we (Scientific Python projects generally) won't suffer all that much from that restriction, in the long term - because we gain instead by having contributors with greater understanding of the code base and their own PRs. That is - as Linus Torvalds seems to imply - don't write code with AI, but use AI for analysis, maintenance and tooling. Evgeni and I discussed the constraint of pushing copyright burden to submitters over at: https://discuss.scientific-python.org/t/a-policy-on-generative-ai-assisted-contributions/1702/27 Cheers, Matthew -- This email is fully human-source. Unless I'm quoting AI, I did not use AI for any text in this email.
_______________________________________________ NumPy-Discussion mailing list -- [email protected] To unsubscribe send an email to [email protected] https://mail.python.org/mailman3//lists/numpy-discussion.python.org Member address: [email protected]
