On Wed, Feb 18, 2026 at 6:04 PM Robert Kern via NumPy-Discussion < [email protected]> wrote:
> On Wed, Feb 18, 2026 at 7:03 PM Matthew Brett <[email protected]> > wrote: > >> Hi, >> >> On Wed, Feb 18, 2026 at 10:33 PM Robert Kern via NumPy-Discussion >> <[email protected]> wrote: >> > >> > On Wed, Feb 18, 2026 at 9:16 AM Matthew Brett <[email protected]> >> wrote: >> >> >> >> One way of doing that - is to ask some AI (if possible, an AI other >> >> than the one generating the code) to review for copyright. I've >> >> experimented with that over at >> >> https://github.com/numpy/numpy/pull/30828#issuecomment-3920553882 . >> >> But the idea would be that we ask a contributor who has generated code >> >> by AI, to do this as part of the PR sign-off. They should be in a >> >> much better position to do this than the maintainers, as they should >> >> have been exploring the problem themselves, and therefore should be >> >> able to write better queries to guide the AI review. And with the >> >> prompts as a start, it's not particularly time-consuming. >> > >> > I think all of the arguments it produced are not grounded in the >> principles of copyright law. Unfortunately, I think this is one of the >> areas where LLMs just generate plausible nonsense rather than sound legal >> analysis. Each thing that it noted was a one-liner or a general idea, >> nothing copyrightable. It's essentially writes like a median StackOverflow >> programmer with a dim understanding of copyright law (no slight intended to >> anyone; I am one). I've looked at the two files it suggested, and I see no >> similarity to the PR. >> > >> > I do kind of suspect that LLMs could be used, with care, to help >> facilitate the abstraction-filtration-comparison test and maybe finding >> candidates to do that test on, but a general instruction to give arguments >> for copyright violation apparently yields more chaff to wade through. >> >> Yes, sure - and you can see me trying to negotiate with Gemini on >> related points in an earlier session here: >> >> https://gist.github.com/matthew-brett/fac33f1b41d98e51b842f8bb84e8c66b >> >> My point was not that AI is doing a good job here - it isn't - but to >> offer it as a starting point for further research for the PR author, >> and reflection for those of us thinking about copyright and AI, on >> what a better process might look like. >> > > IMO, it's definitely not a good starting point for the PR author. It > doesn't matter where it places you as a starting point if it points you in > the wrong direction. You are asking the PR author to defend against > incorrect statements of fact and law. > > I think *some* kind of code search or plagiarism detection service might > be helpful in identifying possible original sources to compare with the > generatred output. It's not at all clear that asking the LLM as an oracle > actually enacts such a search. It plainly did not here, but it presented > its work as such. > > I don't think it's a good policy to construct an ad hoc plagiarism > detection service without validating how it actually performs. I really > strongly suggest that you retract your PR comment. It would be one thing to > try it out and post here about what you found, but to interact with a > contributor that way as an experiment is... ill-advised. > > +1. The interaction on that PR as a whole struck me as harsh, verging on rude. Chuck
_______________________________________________ NumPy-Discussion mailing list -- [email protected] To unsubscribe send an email to [email protected] https://mail.python.org/mailman3//lists/numpy-discussion.python.org Member address: [email protected]
