The risk of copyright violation isn't just with GPL'ed code in the
training set, but also potentially from privately held code that was
accidentally leaked into a training set. Imagine if MathWorks or ESRI
discover their code in our repos and decide to sue. The LLM has access
to an unprecedented dataset of code that a human could never have and
we can't ever be sure there isn't leaked code in it.
_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: [email protected]

Reply via email to