Hi,

On Thu, Jul 4, 2024 at 12:20 PM Ralf Gommers <ralf.gomm...@gmail.com> wrote:
>
>
>
> On Thu, Jul 4, 2024 at 12:55 PM Matthew Brett <matthew.br...@gmail.com> wrote:
>>
>> Sorry - reposting from my subscribed address:
>>
>> Hi,
>>
>> Sorry to top-post!  But - I wanted to bring the discussion back to
>> licensing.  I have great sympathy for the ecological and code-quality
>> concerns, but licensing is a separate question, and, it seems to me,
>> an urgent question.
>>
>> Imagine I asked some AI to give me code to replicate a particular algorithm 
>> A.
>>
>> It is perfectly possible that the AI will largely or completely
>> reproduce some existing GPL code for A, from its training data.  There
>> is no way that I could know that the AI has done that without some
>> substantial research.  Surely, this is a license violation of the GPL
>> code?   Let's say we accept that code.  Others pick up the code and
>> modify it for other algorithms.  The code-base gets infected with GPL
>> code, in a way that will make it very difficult to disentangle.
>
>
> This is a question that's topical for all of open source, and usages of 
> CoPilot & co. We're not going to come to any insightful answer here that is 
> specific to NumPy. There's a ton of discussion in a lot of places; someone 
> needs to research/summarize that to move this forward. Debating it from 
> scratch here is unlikely to yield new arguments imho.

Right - I wasn't expecting a detailed discussion on the merits - only
some thoughts on policy for now.

> I agree with Rohit's: "it is probably hopeless to enforce a ban on AI 
> generated content". There are good ways to use AI code assistant tools and 
> bad ones; we in general cannot know whether AI tools were used at all by a 
> contributor (just like we can't know whether something was copied from Stack 
> Overflow), nor whether when it's done the content is derived enough to fall 
> under some other license. The best we can do here is add a warning to the 
> contributing docs and PR template about this, saying the contributor needs to 
> be the author so copied or AI-generated content needs to not contain things 
> that are complex enough to be copyrightable (none of the linked PRs come 
> close to this threshold).

Yes, these PRs are not the concern - but I believe we do need to plan
now for the future.

I agree it is hard to enforce, but it seems to me it would be a
reasonable defensive move to say - for now - that authors will need to
take full responsibility for copyright, and that, as of now,
AI-generated code cannot meet that standard, so we require authors to
turn off AI-generation when writing code for Numpy.

Cheers,

Matthew
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to