Hi, On Mon, Feb 23, 2026 at 11:18 AM Oscar Benjamin via NumPy-Discussion <[email protected]> wrote: > > On Mon, 23 Feb 2026 at 09:36, Ralf Gommers via NumPy-Discussion > <[email protected]> wrote: > > > > On Sun, Feb 22, 2026 at 7:07 PM Oscar Benjamin via NumPy-Discussion > > <[email protected]> wrote: > >> > >> Then codex went and looked at the PULL_REQUESTS_TEMPLATE.md, looked at > >> the commits, and then produced a PR description matching that > >> template. It filled out the AI disclosure part of the PR template for > >> me > >> ``` > >> #### AI Generation Disclosure > >> > >> Used ChatGPT to help draft PR text only. No code changes were > >> AI-generated in this PR. > >> ``` > >> Both of those sentences are false and it just lied automatically on my > >> behalf without me asking it to do that and without asking for any > >> clarification about what to put there. > > > > > > Based on what you wrote, that seems like user error to me. The commits on > > the branch you made the PR from do not include the `Generated-by` or > > `Co-authored-by` attribution to indicate that those commits were generated > > by an LLM in part or in full. So if you ask Codex in a fresh session, where > > it doesn't have context about the previous work, to look at that branch / > > those commits, how is it supposed to know that the commit authorship on > > those commits is in fact incorrect? > > It could have said "I don't have the information needed to fill out > this part of the template so can you answer these questions" but it > didn't and just falsified the missing information instead. The full > description it wrote was quite long (over a screenful) so you could > miss that AI part if not looking closely. Note that what it wrote > there is pretty much the most common thing that people put in the AI > disclosure and it is very often obviously false. > > > It's indeed possible that there is a model that deliberately and > > systematically lies in order to increase the chances of it being accepted, > > but it's much more likely that the PR message draft you ask for is actually > > correct based on the commit history. > > Maybe I should put Co-authored-by then. I didn't actually let codex > run git commit itself (I was using git myself in a separate terminal > to track what it was doing). > > > tl;dr seems to work as advertised. And inaccuracies and omissions are still > > the responsibility of the human in the loop. > > It is the responsibility of the human in the loop but the most common > failure modes we see right now are: > > - They just delete the entire pull request template and insert something else. > - They specifically delete the AI part of the template. > - The whole description is AI generated and the human has not reviewed > it at all. > > I tested what codex would do because my suspicion is that when they > have deleted the entire template it is because they are using some > kind of (possibly AI) tooling to open the pull request and therefore > not actually reading the template in the web UI. I'm not sure what > they are using though because if you use e.g. codex then it is smart > enough to follow the PR template even if that means filling in the > blanks with false information.
Yes - and the more general point is we can't depend on the AI not making stuff up for the checkboxes. I think that's the big problem with the (Torvalds) approach of - "it's just another tool". That's not really true, it's much more than that - it's a whole other way of working. It would perhaps be better to say "it's another type of developer". One where the usual trust relationships that are so central to open-source, are not valid. It would be a terrible mistake to apply rules that we evolved to work with humans, to the AI. Cheers, Matthew _______________________________________________ NumPy-Discussion mailing list -- [email protected] To unsubscribe send an email to [email protected] https://mail.python.org/mailman3//lists/numpy-discussion.python.org Member address: [email protected]
