On Mon, 23 Feb 2026 at 09:36, Ralf Gommers via NumPy-Discussion
<[email protected]> wrote:
>
> On Sun, Feb 22, 2026 at 7:07 PM Oscar Benjamin via NumPy-Discussion 
> <[email protected]> wrote:
>>
>> Then codex went and looked at the PULL_REQUESTS_TEMPLATE.md, looked at
>> the commits, and then produced a PR description matching that
>> template. It filled out the AI disclosure part of the PR template for
>> me
>> ```
>>   #### AI Generation Disclosure
>>
>>   Used ChatGPT to help draft PR text only. No code changes were
>> AI-generated in this PR.
>> ```
>> Both of those sentences are false and it just lied automatically on my
>> behalf without me asking it to do that and without asking for any
>> clarification about what to put there.
>
>
> Based on what you wrote, that seems like user error to me. The commits on the 
> branch you made the PR from do not include the `Generated-by` or 
> `Co-authored-by` attribution to indicate that those commits were generated by 
> an LLM in part or in full. So if you ask Codex in a fresh session, where it 
> doesn't have context about the previous work, to look at that branch / those 
> commits, how is it supposed to know that the commit authorship on those 
> commits is in fact incorrect?

It could have said "I don't have the information needed to fill out
this part of the template so can you answer these questions" but it
didn't and just falsified the missing information instead. The full
description it wrote was quite long (over a screenful) so you could
miss that AI part if not looking closely. Note that what it wrote
there is pretty much the most common thing that people put in the AI
disclosure and it is very often obviously false.

> It's indeed possible that there is a model that deliberately and 
> systematically lies in order to increase the chances of it being accepted, 
> but it's much more likely that the PR message draft you ask for is actually 
> correct based on the commit history.

Maybe I should put Co-authored-by then. I didn't actually let codex
run git commit itself (I was using git myself in a separate terminal
to track what it was doing).

> tl;dr seems to work as advertised. And inaccuracies and omissions are still 
> the responsibility of the human in the loop.

It is the responsibility of the human in the loop but the most common
failure modes we see right now are:

- They just delete the entire pull request template and insert something else.
- They specifically delete the AI part of the template.
- The whole description is AI generated and the human has not reviewed
it at all.

I tested what codex would do because my suspicion is that when they
have deleted the entire template it is because they are using some
kind of (possibly AI) tooling to open the pull request and therefore
not actually reading the template in the web UI. I'm not sure what
they are using though because if you use e.g. codex then it is smart
enough to follow the PR template even if that means filling in the
blanks with false information.

--
Oscar
_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: [email protected]

Reply via email to