On Wed, Feb 18, 2026 at 9:16 AM Matthew Brett <[email protected]> wrote:
> Hi, > > On Sat, Feb 14, 2026 at 5:38 PM Robert Kern <[email protected]> wrote: > > > > On Sat, Feb 14, 2026 at 12:17 PM Matthew Brett <[email protected]> > wrote: > >> > >> Hi, > >> > >> On Fri, Feb 13, 2026 at 9:45 PM Robert Kern <[email protected]> > wrote: > >> > > >> > On Wed, Feb 11, 2026 at 6:26 PM Matthew Brett via NumPy-Discussion < > [email protected]> wrote: > >> >> > >> >> > >> >> Just to clarify - in case it wasn't clear, what I'm floating as a > proposal, would be something like this, as a message to PR authors: > >> >> > >> >> Please specify one of these: > >> >> > >> >> 1) I wrote this code myself, without looking at significant > AI-generated code OR > >> >> 2) The code contains AI-generated content, but the AI-generated code > is sufficiently trivial that it cannot reasonably be subject to copyright OR > >> >> 3) There is non-trivial AI-generated code in this PR, and I have > documented my searches to confirm that no parts of the code are subject to > existing copyright. > >> >> > >> >> So - the burden for the reviewer is just to confirm, in case 3, that > the author has documented their searches. We take the word of the > contributor for the option they have chosen. Obviously, the documentation > requirement of case 3 is somewhat of a burden for the contributor, and may > therefore encourage them to write the code themselves, to avoid that > burden. That might not be a bad thing, long term, for the project, and it > seems reasonable to me as some defence against copyright violation, and a > message that the project cares about such violation. > >> > > >> > > >> > For Case 3, I would love to see an example of the search that you > would accept. If you could take a recent PR (human or AI, doesn't really > matter for this purpose), and show the search that would satisfy you, that > would go a long way towards clarifying what you are asking for here. We'd > need a worked example or two before adopting this policy because if I don't > know what you are asking for, no new contributor will, either. > >> > >> Yes, that's a reasonable request. But how do you think I should > >> proceed? Make an issue on Numpy, and start drafting? Start another > >> email thread? Or a Discourse / Scientific Python thread? > > > > > > Just here should be fine. Take an existing PR that has copyrightable > content (e.g. an entire new function or three, each more than ~10 lines, > not just many one-line updates scattered around; the most interesting ones > would be those that implement a known algorithm). Do the code search that > would satisfy you. Write out here what you would want a PR author to > provide. > > I'd suggested (off-list) that this might be better done in another > thread - but perhaps it can be done here. > > Reflecting, and experimenting - there are many caveats, but I think it > is reasonable to give the contributor some responsibility for formal > care about copyright. > > One way of doing that - is to ask some AI (if possible, an AI other > than the one generating the code) to review for copyright. I've > experimented with that over at > https://github.com/numpy/numpthis looks > likey/pull/30828#issuecomment-3920553882 > <https://github.com/numpy/numpy/pull/30828#issuecomment-3920553882> . > But the idea would be that we ask a contributor who has generated code > by AI, to do this as part of the PR sign-off. They should be in a > much better position to do this than the maintainers, as they should > have been exploring the problem themselves, and therefore should be > able to write better queries to guide the AI review. And with the > prompts as a start, it's not particularly time-consuming. > I think all of the arguments it produced are not grounded in the principles of copyright law. Unfortunately, I think this is one of the areas where LLMs just generate plausible nonsense rather than sound legal analysis. Each thing that it noted was a one-liner or a general idea, nothing copyrightable. It's essentially writes like a median StackOverflow programmer with a dim understanding of copyright law (no slight intended to anyone; I am one). I've looked at the two files it suggested, and I see no similarity to the PR. I do kind of suspect that LLMs could be used, with care, to help facilitate the abstraction-filtration-comparison test <https://en.wikipedia.org/wiki/Abstraction-Filtration-Comparison_test> and maybe finding candidates to do that test on, but a general instruction to give arguments for copyright violation apparently yields more chaff to wade through. -- Robert Kern
_______________________________________________ NumPy-Discussion mailing list -- [email protected] To unsubscribe send an email to [email protected] https://mail.python.org/mailman3//lists/numpy-discussion.python.org Member address: [email protected]
