Write out here what you would want a PR author to provide.

Please let me honk in from philosophy/science/psychology to reiterate,

"The prompts, Boss, the prompts!"

Don't you think that some sample at least would add some conviction that a human was involved?

Case in point is mrdope, who I imagine could be a person talking through an LLM out of shyness, overproductivity syndrome, or as an experiment. Per my comment:

https://github.com/numpy/numpy/pull/30828 [1].
Is the work the product of ai? Yes, but the author claims to have verified the code. Is the author ai or not? Should we proceed with the PR?
Matti

The interactions ... have the flat agreeableness of an LLM. mrdope, are you hear and clear? :-)

Bill

--

https://github.com/phobrain/phobrain

"Don't dope me, I'm just the racehorse here."

On 2026-02-14 09:38, Robert Kern via NumPy-Discussion wrote:

On Sat, Feb 14, 2026 at 12:17 PM Matthew Brett <[email protected]> wrote:

Hi,

On Fri, Feb 13, 2026 at 9:45 PM Robert Kern <[email protected]> wrote:

On Wed, Feb 11, 2026 at 6:26 PM Matthew Brett via NumPy-Discussion <[email protected]> wrote:


Just to clarify - in case it wasn't clear, what I'm floating as a proposal, would be something like this, as a message to PR authors:

Please specify one of these:

1) I wrote this code myself, without looking at significant AI-generated code OR 2) The code contains AI-generated content, but the AI-generated code is sufficiently trivial that it cannot reasonably be subject to copyright OR 3) There is non-trivial AI-generated code in this PR, and I have documented my searches to confirm that no parts of the code are subject to existing copyright.

So - the burden for the reviewer is just to confirm, in case 3, that the author has documented their searches. We take the word of the contributor for the option they have chosen. Obviously, the documentation requirement of case 3 is somewhat of a burden for the contributor, and may therefore encourage them to write the code themselves, to avoid that burden. That might not be a bad thing, long term, for the project, and it seems reasonable to me as some defence against copyright violation, and a message that the project cares about such violation.


For Case 3, I would love to see an example of the search that you would accept. If you could take a recent PR (human or AI, doesn't really matter for this purpose), and show the search that would satisfy you, that would go a long way towards clarifying what you are asking for here. We'd need a worked example or two before adopting this policy because if I don't know what you are asking for, no new contributor will, either.

Yes, that's a reasonable request.   But how do you think I should
proceed?   Make an issue on Numpy, and start drafting?   Start another
email thread?  Or a Discourse / Scientific Python thread?

Just here should be fine. Take an existing PR that has copyrightable content (e.g. an entire new function or three, each more than ~10 lines, not just many one-line updates scattered around; the most interesting ones would be those that implement a known algorithm). Do the code search that would satisfy you. Write out here what you would want a PR author to provide.
--
Robert Kern
_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: [email protected]


Links:
------
[1] https://github.com/numpy/numpy/pull/30828
_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: [email protected]

Reply via email to