[Numpy-discussion] Re: Current policy on AI-generated code in NumPy

Robert Kern via NumPy-Discussion Thu, 19 Feb 2026 06:42:06 -0800

On Thu, Feb 19, 2026 at 5:24 AM Matthew Brett via NumPy-Discussion <
[email protected]> wrote:


> Hi,
>
> On Thu, Feb 19, 2026 at 2:48 AM Charles R Harris via NumPy-Discussion
> <[email protected]> wrote:
> >
> >
> >
> > On Wed, Feb 18, 2026 at 6:04 PM Robert Kern via NumPy-Discussion <
> [email protected]> wrote:
> >>
> >> On Wed, Feb 18, 2026 at 7:03 PM Matthew Brett <[email protected]>
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> On Wed, Feb 18, 2026 at 10:33 PM Robert Kern via NumPy-Discussion
> >>> <[email protected]> wrote:
> >>> >
> >>> > On Wed, Feb 18, 2026 at 9:16 AM Matthew Brett <
> [email protected]> wrote:
> >>> >>
> >>> >> One way of doing that - is to ask some AI (if possible, an AI other
> >>> >> than the one generating the code) to review for copyright.  I've
> >>> >> experimented with that over at
> >>> >> https://github.com/numpy/numpy/pull/30828#issuecomment-3920553882 .
> >>> >> But the idea would be that we ask a contributor who has generated
> code
> >>> >> by AI, to do this as part of the PR sign-off.   They should be in a
> >>> >> much better position to do this than the maintainers, as they should
> >>> >> have been exploring the problem themselves, and therefore should be
> >>> >> able to write better queries to guide the AI review.   And with the
> >>> >> prompts as a start, it's not particularly time-consuming.
> >>> >
> >>> > I think all of the arguments it produced are not grounded in the
> principles of copyright law. Unfortunately, I think this is one of the
> areas where LLMs just generate plausible nonsense rather than sound legal
> analysis. Each thing that it noted was a one-liner or a general idea,
> nothing copyrightable. It's essentially writes like a median StackOverflow
> programmer with a dim understanding of copyright law (no slight intended to
> anyone; I am one). I've looked at the two files it suggested, and I see no
> similarity to the PR.
> >>> >
> >>> > I do kind of suspect that LLMs could be used, with care, to help
> facilitate the abstraction-filtration-comparison test and maybe finding
> candidates to do that test on, but a general instruction to give arguments
> for copyright violation apparently yields more chaff to wade through.
> >>>
> >>> Yes, sure - and you can see me trying to negotiate with Gemini on
> >>> related points in an earlier session here:
> >>>
> >>> https://gist.github.com/matthew-brett/fac33f1b41d98e51b842f8bb84e8c66b
> >>>
> >>> My point was not that AI is doing a good job here - it isn't - but to
> >>> offer it as a starting point for further research for the PR author,
> >>> and reflection for those of us thinking about copyright and AI, on
> >>> what a better process might look like.
> >>
> >>
> >> IMO, it's definitely not a good starting point for the PR author. It
> doesn't matter where it places you as a starting point if it points you in
> the wrong direction. You are asking the PR author to defend against
> incorrect statements of fact and law.
> >>
> >> I think *some* kind of code search or plagiarism detection service
> might be helpful in identifying possible original sources to compare with
> the generatred output. It's not at all clear that asking the LLM as an
> oracle actually enacts such a search. It plainly did not here, but it
> presented its work as such.
> >>
> >> I don't think it's a good policy to construct an ad hoc plagiarism
> detection service without validating how it actually performs. I really
> strongly suggest that you retract your PR comment. It would be one thing to
> try it out and post here about what you found, but to interact with a
> contributor that way as an experiment is... ill-advised.
> >>
> >
> > +1. The interaction on that PR as a whole struck me as harsh, verging on
> rude.
>
> You surely don't mean that it is harsh or rude to post the AI summary,
> along with:  "Obviously - as designed - this is deliberately Red Team.
> But @mdrdope - no pressure, and feel free not to answer - do you have
> any response to the Gemini comments?"
>

Yes. 100%. Irresponsible, too. It's as if a teacher decided to hack
together his own plagiarism detector, ran it on his students' work and
asked them to respond. Before actually seeing if the thing worked to *any*
extent on a known corpus. "no pressure" coming from someone in authority
(you have that "Member" tag; you represent the project when you interact
with its PRs) is meaningless.

You are responsible for the words you put there, whether you use an LLM to
generate them or not. Weasel words that the LLM may be wrong don't absolve
you of this.

You knew that the specific things in there were wrong. You showed us a
previous conversation where you identified the wrong claims and found that
you could guide the LLM away from them. But in this final conversation, you
chose not to provide that guidance and in fact make it more aggressively
wrong. And this is the one that you chose to put forth live instead of
discussing it here. Which is what I actually asked for. I just wanted a
gist to look at and evaluate *before* we decided as a project what to put
forth as policy.

That's one of the advantages of the asking the contributor themselves
> to do that review - it makes it less likely that they will take
> offense to the output of the AI.  Anyone using AI will know that it
> will frequently be wrong, and it will be more obvious to them that the
> AI output is not a judgment, but may serve as a starting point for
> reflection and investigation.   For example, it may draw the author,
> and the maintainers, into a more thoughtful and informed discussion of
> copyright.
>

You are rapidly draining my ability to believe that you are operating in
good faith. Rather, it increasingly seems like you are strawmanning a
particularly bad use of LLMs in order to make a point that LLMs are bad.

Good faith is a presumption. It can be undermined with experience.

-- 
Robert Kern

_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: [email protected]

[Numpy-discussion] Re: Current policy on AI-generated code in NumPy

Reply via email to