Hi, On Thu, Jul 4, 2024 at 12:20 PM Ralf Gommers <ralf.gomm...@gmail.com> wrote: > > > > On Thu, Jul 4, 2024 at 12:55 PM Matthew Brett <matthew.br...@gmail.com> wrote: >> >> Sorry - reposting from my subscribed address: >> >> Hi, >> >> Sorry to top-post! But - I wanted to bring the discussion back to >> licensing. I have great sympathy for the ecological and code-quality >> concerns, but licensing is a separate question, and, it seems to me, >> an urgent question. >> >> Imagine I asked some AI to give me code to replicate a particular algorithm >> A. >> >> It is perfectly possible that the AI will largely or completely >> reproduce some existing GPL code for A, from its training data. There >> is no way that I could know that the AI has done that without some >> substantial research. Surely, this is a license violation of the GPL >> code? Let's say we accept that code. Others pick up the code and >> modify it for other algorithms. The code-base gets infected with GPL >> code, in a way that will make it very difficult to disentangle. > > > This is a question that's topical for all of open source, and usages of > CoPilot & co. We're not going to come to any insightful answer here that is > specific to NumPy. There's a ton of discussion in a lot of places; someone > needs to research/summarize that to move this forward. Debating it from > scratch here is unlikely to yield new arguments imho.
Right - I wasn't expecting a detailed discussion on the merits - only some thoughts on policy for now. > I agree with Rohit's: "it is probably hopeless to enforce a ban on AI > generated content". There are good ways to use AI code assistant tools and > bad ones; we in general cannot know whether AI tools were used at all by a > contributor (just like we can't know whether something was copied from Stack > Overflow), nor whether when it's done the content is derived enough to fall > under some other license. The best we can do here is add a warning to the > contributing docs and PR template about this, saying the contributor needs to > be the author so copied or AI-generated content needs to not contain things > that are complex enough to be copyrightable (none of the linked PRs come > close to this threshold). Yes, these PRs are not the concern - but I believe we do need to plan now for the future. I agree it is hard to enforce, but it seems to me it would be a reasonable defensive move to say - for now - that authors will need to take full responsibility for copyright, and that, as of now, AI-generated code cannot meet that standard, so we require authors to turn off AI-generation when writing code for Numpy. Cheers, Matthew _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com