[Numpy-discussion] Re: Current policy on AI-generated code in NumPy

Charles R Harris via NumPy-Discussion Wed, 18 Feb 2026 18:50:29 -0800

On Wed, Feb 18, 2026 at 6:04 PM Robert Kern via NumPy-Discussion <
[email protected]> wrote:


> On Wed, Feb 18, 2026 at 7:03 PM Matthew Brett <[email protected]>
> wrote:
>
>> Hi,
>>
>> On Wed, Feb 18, 2026 at 10:33 PM Robert Kern via NumPy-Discussion
>> <[email protected]> wrote:
>> >
>> > On Wed, Feb 18, 2026 at 9:16 AM Matthew Brett <[email protected]>
>> wrote:
>> >>
>> >> One way of doing that - is to ask some AI (if possible, an AI other
>> >> than the one generating the code) to review for copyright.  I've
>> >> experimented with that over at
>> >> https://github.com/numpy/numpy/pull/30828#issuecomment-3920553882 .
>> >> But the idea would be that we ask a contributor who has generated code
>> >> by AI, to do this as part of the PR sign-off.   They should be in a
>> >> much better position to do this than the maintainers, as they should
>> >> have been exploring the problem themselves, and therefore should be
>> >> able to write better queries to guide the AI review.   And with the
>> >> prompts as a start, it's not particularly time-consuming.
>> >
>> > I think all of the arguments it produced are not grounded in the
>> principles of copyright law. Unfortunately, I think this is one of the
>> areas where LLMs just generate plausible nonsense rather than sound legal
>> analysis. Each thing that it noted was a one-liner or a general idea,
>> nothing copyrightable. It's essentially writes like a median StackOverflow
>> programmer with a dim understanding of copyright law (no slight intended to
>> anyone; I am one). I've looked at the two files it suggested, and I see no
>> similarity to the PR.
>> >
>> > I do kind of suspect that LLMs could be used, with care, to help
>> facilitate the abstraction-filtration-comparison test and maybe finding
>> candidates to do that test on, but a general instruction to give arguments
>> for copyright violation apparently yields more chaff to wade through.
>>
>> Yes, sure - and you can see me trying to negotiate with Gemini on
>> related points in an earlier session here:
>>
>> https://gist.github.com/matthew-brett/fac33f1b41d98e51b842f8bb84e8c66b
>>
>> My point was not that AI is doing a good job here - it isn't - but to
>> offer it as a starting point for further research for the PR author,
>> and reflection for those of us thinking about copyright and AI, on
>> what a better process might look like.
>>
>
> IMO, it's definitely not a good starting point for the PR author. It
> doesn't matter where it places you as a starting point if it points you in
> the wrong direction. You are asking the PR author to defend against
> incorrect statements of fact and law.
>
> I think *some* kind of code search or plagiarism detection service might
> be helpful in identifying possible original sources to compare with the
> generatred output. It's not at all clear that asking the LLM as an oracle
> actually enacts such a search. It plainly did not here, but it presented
> its work as such.
>
> I don't think it's a good policy to construct an ad hoc plagiarism
> detection service without validating how it actually performs. I really
> strongly suggest that you retract your PR comment. It would be one thing to
> try it out and post here about what you found, but to interact with a
> contributor that way as an experiment is... ill-advised.
>
>
+1. The interaction on that PR as a whole struck me as harsh, verging on
rude.

Chuck

_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: [email protected]

[Numpy-discussion] Re: Current policy on AI-generated code in NumPy

Reply via email to