Hi,

I think it would be prudent to be more explicit with regards to the threat
of copyright violation.
It is almost certainly true that GPL code has been part of the training set
and it is very possible that even private code has found its way in there.

But that is also true for human contributors. After all, am I, having read
Numerical Recipes in the 90s, ineligible to implement Runge-Kutta methods
in open source? Surely not. Can an employee of a company write "from
scratch" in open source today a method they (co-)developed at work
yesterday? Also not.

A straight-up copy of a large enough section of code is problematic, but
less likely than an amalgamation of existing code adapted to the Numpy
codebase already today and imho moreso in the future.

As such, Matthew's proposed compromise sounds reasonable to me, and I would
add as guidance for reviewers that PRs adding large chunks of
self-contained code, i.e. entirely new big files or hundreds of lines of
consecutive code with no relation to existing Numpy code, deserve a higher
level of scrutiny for this specific risk.

Cheers
Klaus


On Fri, Feb 13, 2026 at 1:44 PM Benjamin Root via NumPy-Discussion <
[email protected]> wrote:

> The risk of copyright violation isn't just with GPL'ed code in the
> training set, but also potentially from privately held code that was
> accidentally leaked into a training set. Imagine if MathWorks or ESRI
> discover their code in our repos and decide to sue. The LLM has access
> to an unprecedented dataset of code that a human could never have and
> we can't ever be sure there isn't leaked code in it.
> _______________________________________________
> NumPy-Discussion mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> https://mail.python.org/mailman3//lists/numpy-discussion.python.org
> Member address: [email protected]
>
_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: [email protected]

Reply via email to