On Mon, 23 Feb 2026 at 09:36, Ralf Gommers via NumPy-Discussion <[email protected]> wrote: > > On Sun, Feb 22, 2026 at 7:07 PM Oscar Benjamin via NumPy-Discussion > <[email protected]> wrote: >> >> Then codex went and looked at the PULL_REQUESTS_TEMPLATE.md, looked at >> the commits, and then produced a PR description matching that >> template. It filled out the AI disclosure part of the PR template for >> me >> ``` >> #### AI Generation Disclosure >> >> Used ChatGPT to help draft PR text only. No code changes were >> AI-generated in this PR. >> ``` >> Both of those sentences are false and it just lied automatically on my >> behalf without me asking it to do that and without asking for any >> clarification about what to put there. > > > Based on what you wrote, that seems like user error to me. The commits on the > branch you made the PR from do not include the `Generated-by` or > `Co-authored-by` attribution to indicate that those commits were generated by > an LLM in part or in full. So if you ask Codex in a fresh session, where it > doesn't have context about the previous work, to look at that branch / those > commits, how is it supposed to know that the commit authorship on those > commits is in fact incorrect?
It could have said "I don't have the information needed to fill out this part of the template so can you answer these questions" but it didn't and just falsified the missing information instead. The full description it wrote was quite long (over a screenful) so you could miss that AI part if not looking closely. Note that what it wrote there is pretty much the most common thing that people put in the AI disclosure and it is very often obviously false. > It's indeed possible that there is a model that deliberately and > systematically lies in order to increase the chances of it being accepted, > but it's much more likely that the PR message draft you ask for is actually > correct based on the commit history. Maybe I should put Co-authored-by then. I didn't actually let codex run git commit itself (I was using git myself in a separate terminal to track what it was doing). > tl;dr seems to work as advertised. And inaccuracies and omissions are still > the responsibility of the human in the loop. It is the responsibility of the human in the loop but the most common failure modes we see right now are: - They just delete the entire pull request template and insert something else. - They specifically delete the AI part of the template. - The whole description is AI generated and the human has not reviewed it at all. I tested what codex would do because my suspicion is that when they have deleted the entire template it is because they are using some kind of (possibly AI) tooling to open the pull request and therefore not actually reading the template in the web UI. I'm not sure what they are using though because if you use e.g. codex then it is smart enough to follow the PR template even if that means filling in the blanks with false information. -- Oscar _______________________________________________ NumPy-Discussion mailing list -- [email protected] To unsubscribe send an email to [email protected] https://mail.python.org/mailman3//lists/numpy-discussion.python.org Member address: [email protected]
