Hi,

On Mon, Feb 23, 2026 at 11:18 AM Oscar Benjamin via NumPy-Discussion
<[email protected]> wrote:
>
> On Mon, 23 Feb 2026 at 09:36, Ralf Gommers via NumPy-Discussion
> <[email protected]> wrote:
> >
> > On Sun, Feb 22, 2026 at 7:07 PM Oscar Benjamin via NumPy-Discussion 
> > <[email protected]> wrote:
> >>
> >> Then codex went and looked at the PULL_REQUESTS_TEMPLATE.md, looked at
> >> the commits, and then produced a PR description matching that
> >> template. It filled out the AI disclosure part of the PR template for
> >> me
> >> ```
> >>   #### AI Generation Disclosure
> >>
> >>   Used ChatGPT to help draft PR text only. No code changes were
> >> AI-generated in this PR.
> >> ```
> >> Both of those sentences are false and it just lied automatically on my
> >> behalf without me asking it to do that and without asking for any
> >> clarification about what to put there.
> >
> >
> > Based on what you wrote, that seems like user error to me. The commits on 
> > the branch you made the PR from do not include the `Generated-by` or 
> > `Co-authored-by` attribution to indicate that those commits were generated 
> > by an LLM in part or in full. So if you ask Codex in a fresh session, where 
> > it doesn't have context about the previous work, to look at that branch / 
> > those commits, how is it supposed to know that the commit authorship on 
> > those commits is in fact incorrect?
>
> It could have said "I don't have the information needed to fill out
> this part of the template so can you answer these questions" but it
> didn't and just falsified the missing information instead. The full
> description it wrote was quite long (over a screenful) so you could
> miss that AI part if not looking closely. Note that what it wrote
> there is pretty much the most common thing that people put in the AI
> disclosure and it is very often obviously false.
>
> > It's indeed possible that there is a model that deliberately and 
> > systematically lies in order to increase the chances of it being accepted, 
> > but it's much more likely that the PR message draft you ask for is actually 
> > correct based on the commit history.
>
> Maybe I should put Co-authored-by then. I didn't actually let codex
> run git commit itself (I was using git myself in a separate terminal
> to track what it was doing).
>
> > tl;dr seems to work as advertised. And inaccuracies and omissions are still 
> > the responsibility of the human in the loop.
>
> It is the responsibility of the human in the loop but the most common
> failure modes we see right now are:
>
> - They just delete the entire pull request template and insert something else.
> - They specifically delete the AI part of the template.
> - The whole description is AI generated and the human has not reviewed
> it at all.
>
> I tested what codex would do because my suspicion is that when they
> have deleted the entire template it is because they are using some
> kind of (possibly AI) tooling to open the pull request and therefore
> not actually reading the template in the web UI. I'm not sure what
> they are using though because if you use e.g. codex then it is smart
> enough to follow the PR template even if that means filling in the
> blanks with false information.

Yes - and the more general point is we can't depend on the AI not
making stuff up for the checkboxes.

I think that's the big problem with the (Torvalds) approach of - "it's
just another tool".   That's not really true, it's much more than that
- it's a whole other way of working.   It would perhaps be better to
say "it's another type of developer".    One where the usual trust
relationships that are so central to open-source, are not valid.   It
would be a terrible mistake to apply rules that we evolved to work
with humans, to the AI.

Cheers,

Matthew
_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: [email protected]

Reply via email to