Hi, I think it would be prudent to be more explicit with regards to the threat of copyright violation. It is almost certainly true that GPL code has been part of the training set and it is very possible that even private code has found its way in there.
But that is also true for human contributors. After all, am I, having read Numerical Recipes in the 90s, ineligible to implement Runge-Kutta methods in open source? Surely not. Can an employee of a company write "from scratch" in open source today a method they (co-)developed at work yesterday? Also not. A straight-up copy of a large enough section of code is problematic, but less likely than an amalgamation of existing code adapted to the Numpy codebase already today and imho moreso in the future. As such, Matthew's proposed compromise sounds reasonable to me, and I would add as guidance for reviewers that PRs adding large chunks of self-contained code, i.e. entirely new big files or hundreds of lines of consecutive code with no relation to existing Numpy code, deserve a higher level of scrutiny for this specific risk. Cheers Klaus On Fri, Feb 13, 2026 at 1:44 PM Benjamin Root via NumPy-Discussion < [email protected]> wrote: > The risk of copyright violation isn't just with GPL'ed code in the > training set, but also potentially from privately held code that was > accidentally leaked into a training set. Imagine if MathWorks or ESRI > discover their code in our repos and decide to sue. The LLM has access > to an unprecedented dataset of code that a human could never have and > we can't ever be sure there isn't leaked code in it. > _______________________________________________ > NumPy-Discussion mailing list -- [email protected] > To unsubscribe send an email to [email protected] > https://mail.python.org/mailman3//lists/numpy-discussion.python.org > Member address: [email protected] >
_______________________________________________ NumPy-Discussion mailing list -- [email protected] To unsubscribe send an email to [email protected] https://mail.python.org/mailman3//lists/numpy-discussion.python.org Member address: [email protected]
