Hi, On Wed, Feb 18, 2026 at 10:33 PM Robert Kern via NumPy-Discussion <[email protected]> wrote: > > On Wed, Feb 18, 2026 at 9:16 AM Matthew Brett <[email protected]> wrote: >> >> Hi, >> >> On Sat, Feb 14, 2026 at 5:38 PM Robert Kern <[email protected]> wrote: >> > >> > On Sat, Feb 14, 2026 at 12:17 PM Matthew Brett <[email protected]> >> > wrote: >> >> >> >> Hi, >> >> >> >> On Fri, Feb 13, 2026 at 9:45 PM Robert Kern <[email protected]> wrote: >> >> > >> >> > On Wed, Feb 11, 2026 at 6:26 PM Matthew Brett via NumPy-Discussion >> >> > <[email protected]> wrote: >> >> >> >> >> >> >> >> >> Just to clarify - in case it wasn't clear, what I'm floating as a >> >> >> proposal, would be something like this, as a message to PR authors: >> >> >> >> >> >> Please specify one of these: >> >> >> >> >> >> 1) I wrote this code myself, without looking at significant >> >> >> AI-generated code OR >> >> >> 2) The code contains AI-generated content, but the AI-generated code >> >> >> is sufficiently trivial that it cannot reasonably be subject to >> >> >> copyright OR >> >> >> 3) There is non-trivial AI-generated code in this PR, and I have >> >> >> documented my searches to confirm that no parts of the code are >> >> >> subject to existing copyright. >> >> >> >> >> >> So - the burden for the reviewer is just to confirm, in case 3, that >> >> >> the author has documented their searches. We take the word of the >> >> >> contributor for the option they have chosen. Obviously, the >> >> >> documentation requirement of case 3 is somewhat of a burden for the >> >> >> contributor, and may therefore encourage them to write the code >> >> >> themselves, to avoid that burden. That might not be a bad thing, long >> >> >> term, for the project, and it seems reasonable to me as some defence >> >> >> against copyright violation, and a message that the project cares >> >> >> about such violation. >> >> > >> >> > >> >> > For Case 3, I would love to see an example of the search that you would >> >> > accept. If you could take a recent PR (human or AI, doesn't really >> >> > matter for this purpose), and show the search that would satisfy you, >> >> > that would go a long way towards clarifying what you are asking for >> >> > here. We'd need a worked example or two before adopting this policy >> >> > because if I don't know what you are asking for, no new contributor >> >> > will, either. >> >> >> >> Yes, that's a reasonable request. But how do you think I should >> >> proceed? Make an issue on Numpy, and start drafting? Start another >> >> email thread? Or a Discourse / Scientific Python thread? >> > >> > >> > Just here should be fine. Take an existing PR that has copyrightable >> > content (e.g. an entire new function or three, each more than ~10 lines, >> > not just many one-line updates scattered around; the most interesting ones >> > would be those that implement a known algorithm). Do the code search that >> > would satisfy you. Write out here what you would want a PR author to >> > provide. >> >> I'd suggested (off-list) that this might be better done in another >> thread - but perhaps it can be done here. >> >> Reflecting, and experimenting - there are many caveats, but I think it >> is reasonable to give the contributor some responsibility for formal >> care about copyright. >> >> One way of doing that - is to ask some AI (if possible, an AI other >> than the one generating the code) to review for copyright. I've >> experimented with that over at >> https://github.com/numpy/numpthis looks >> likey/pull/30828#issuecomment-3920553882 . >> But the idea would be that we ask a contributor who has generated code >> by AI, to do this as part of the PR sign-off. They should be in a >> much better position to do this than the maintainers, as they should >> have been exploring the problem themselves, and therefore should be >> able to write better queries to guide the AI review. And with the >> prompts as a start, it's not particularly time-consuming. > > > I think all of the arguments it produced are not grounded in the principles > of copyright law. Unfortunately, I think this is one of the areas where LLMs > just generate plausible nonsense rather than sound legal analysis. Each thing > that it noted was a one-liner or a general idea, nothing copyrightable. It's > essentially writes like a median StackOverflow programmer with a dim > understanding of copyright law (no slight intended to anyone; I am one). I've > looked at the two files it suggested, and I see no similarity to the PR. > > I do kind of suspect that LLMs could be used, with care, to help facilitate > the abstraction-filtration-comparison test and maybe finding candidates to do > that test on, but a general instruction to give arguments for copyright > violation apparently yields more chaff to wade through.
Yes, sure - and you can see me trying to negotiate with Gemini on related points in an earlier session here: https://gist.github.com/matthew-brett/fac33f1b41d98e51b842f8bb84e8c66b My point was not that AI is doing a good job here - it isn't - but to offer it as a starting point for further research for the PR author, and reflection for those of us thinking about copyright and AI, on what a better process might look like. Cheers, Matthew _______________________________________________ NumPy-Discussion mailing list -- [email protected] To unsubscribe send an email to [email protected] https://mail.python.org/mailman3//lists/numpy-discussion.python.org Member address: [email protected]
