On Thu, Jul 4, 2024 at 12:55 PM Matthew Brett <matthew.br...@gmail.com> wrote:
> Sorry - reposting from my subscribed address: > > Hi, > > Sorry to top-post! But - I wanted to bring the discussion back to > licensing. I have great sympathy for the ecological and code-quality > concerns, but licensing is a separate question, and, it seems to me, > an urgent question. > > Imagine I asked some AI to give me code to replicate a particular > algorithm A. > > It is perfectly possible that the AI will largely or completely > reproduce some existing GPL code for A, from its training data. There > is no way that I could know that the AI has done that without some > substantial research. Surely, this is a license violation of the GPL > code? Let's say we accept that code. Others pick up the code and > modify it for other algorithms. The code-base gets infected with GPL > code, in a way that will make it very difficult to disentangle. > This is a question that's topical for all of open source, and usages of CoPilot & co. We're not going to come to any insightful answer here that is specific to NumPy. There's a ton of discussion in a lot of places; someone needs to research/summarize that to move this forward. Debating it from scratch here is unlikely to yield new arguments imho. I agree with Rohit's: "it is probably hopeless to enforce a ban on AI generated content". There are good ways to use AI code assistant tools and bad ones; we in general cannot know whether AI tools were used at all by a contributor (just like we can't know whether something was copied from Stack Overflow), nor whether when it's done the content is derived enough to fall under some other license. The best we can do here is add a warning to the contributing docs and PR template about this, saying the contributor needs to be the author so copied or AI-generated content needs to not contain things that are complex enough to be copyrightable (none of the linked PRs come close to this threshold). > Have we consulted a copyright lawyer on this? Specifically, have we > consulted someone who advocates the GPL? > Not that I know of. Cheers, Ralf > Cheers, > > Matthew > > On Thu, Jul 4, 2024 at 11:27 AM Marten van Kerkwijk > <m...@astro.utoronto.ca> wrote: > > > > Hi All, > > > > I agree with Dan that the actual contributions to the documentation are > > of little value: it is not easy to write good documentation, with > > examples that show not just the mechnanics but the purpose of the > > function, i.e., go well beyond just showing some random inputs and > > outputs. And poorly constructed examples are detrimental in that they > > just hide the fact that the documentation is bad. > > > > I also second his worries about ecological and social costs. > > > > But let me add a third issue: the costs to maintainers. I had a quick > > glance at some of those PRs when they were first posted, but basically > > decided they were not worth my time to review. For a human contributor, > > I might well have decided differently, since helping someone to improve > > their contribution often leads to higher quality further contributions. > > But here there seems to be no such hope. > > > > All the best, > > > > Marten > > > > Daniele Nicolodi <dani...@grinta.net> writes: > > > > > On 03/07/24 23:40, Matthew Brett wrote: > > >> Hi, > > >> > > >> We recently got a set of well-labeled PRs containing (reviewed) > > >> AI-generated code: > > >> > > >> https://github.com/numpy/numpy/pull/26827 > > >> https://github.com/numpy/numpy/pull/26828 > > >> https://github.com/numpy/numpy/pull/26829 > > >> https://github.com/numpy/numpy/pull/26830 > > >> https://github.com/numpy/numpy/pull/26831 > > >> > > >> Do we have a policy on AI-generated code? It seems to me that > > >> AI-code in general must be a license risk, as the AI may well generate > > >> code that was derived from, for example, code with a GPL-license. > > > > > > There is definitely the issue of copyright to keep in mind, but I see > > > two other issues: the quality of the contributions and one moral issue. > > > > > > IMHO the PR linked above are not high quality contributions: for > > > example, the added examples are often redundant with each other. In my > > > experience these are representative of automatically generate content: > > > as there is little to no effort involved into writing it, the content > is > > > often repetitive and with very low information density. In the case of > > > documentation, I find this very detrimental to the overall quality. > > > > > > Contributions generated with AI have huge ecological and social costs. > > > Encouraging AI generated contributions, especially where there is > > > absolutely no need to involve AI to get to the solution, as in the > > > examples above, makes the project co-responsible for these costs. > > > > > > Cheers, > > > Dan > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > > To unsubscribe send an email to numpy-discussion-le...@python.org > > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > > Member address: m...@astro.utoronto.ca > > _______________________________________________ > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > To unsubscribe send an email to numpy-discussion-le...@python.org > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > Member address: matthew.br...@gmail.com > _______________________________________________ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: ralf.gomm...@gmail.com >
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com