[Numpy-discussion] Re: Current policy on AI-generated code in NumPy

Matthew Brett via NumPy-Discussion Sat, 07 Feb 2026 09:55:38 -0800

Hi

On Sat, Feb 7, 2026 at 4:54 PM Charles R Harris
<[email protected]> wrote:
>
>
>
> On Sat, Feb 7, 2026 at 7:05 AM Matthew Brett via NumPy-Discussion 
> <[email protected]> wrote:
>>
>> Hi,
>>
>> This is just a plea for some careful thought at this point.
>>
>> There are futures here that we likely don't want.  For example,
>> imagine Numpy filling up with large blocks of AI-generated code, and
>> huge PRs that are effectively impossible for humans to review.   As
>> Oscar and Stefan have pointed out - consider what effect that is going
>> to have on the social enterprise of open-source coding - and our
>> ability to train new contributors.
>>
>> I believe we are also obliged to think hard about the consequences for
>> copyright.   We discussed that a bit here:
>>
>> https://github.com/matthew-brett/sp-ai-post/blob/main/notes.md
>>
>> In particular - there is no good way to ensure that the AI has not
>> sucked in copyrighted code - even if you've asked it to do a simple
>> port of other and clearly licensed code.  There is some evidence that
>> AI coding agents are, for whatever reason, particularly reluctant to
>> point to GPL-licensing, when asked for code attribution.
>>
>> I don't think the argument that AI is inevitable is useful - yes, it's
>> clear that AI will be part of coding in some sense, but we have yet to
>> work out what part that will be.
>>
>> For example, there are different models of AI use - some of us are
>> starting to generate large bodies of code with AI - such as Matthew
>> Rocklin : https://matthewrocklin.com/ai-zealotry/ - but his discussion
>> is useful.  Here are two key quotes:
>>
>> * "LLMs generate a lot of junk"
>> * "AI creates technical debt, but it can clean some of it up too. (at
>> least at a certain granularity)"
>> * "The code we write with AI probably won't be as good as hand-crafted
>> code, but we'll write 10x more of it"
>>
>> https://matthewrocklin.com/ai-zealotry/
>>
>> Another experienced engineer reflecting on his use of AI:
>>
>> """ ...  LLM coding will split up engineers based on those who
>> primarily liked coding and those who primarily liked building.
>>
>> Atrophy. I've already noticed that I am slowly starting to atrophy my
>> ability to write code manually. Generation (writing code) and
>> discrimination (reading code) are different capabilities in the brain.
>> Largely due to all the little mostly syntactic details involved in
>> programming, you can review code just fine even if you struggle to
>> write it.
>> """
>>
>> https://x.com/karpathy/status/2015883857489522876
>>
>> Conversely - Linus Torvalds has a different model of how AI should work:
>>
>> """
>> Torvalds said he's "much less interested in AI for writing code" and
>> far more excited about "AI as the tool to help maintain code,
>> including automated patch checking and code review before changes ever
>> reach him."
>> """
>>
>> https://www.zdnet.com/article/linus-torvalds-ai-tool-maintaining-linux-code/
>>
>> I guess y'all saw the recent Anthropic research paper comparing groups
>> randomized to AI vs no-AI working on code problems.  They found little
>> speedup from AI, but a dramatic drop in the level of understanding of
>> the library they were using (in fact this was Trio).   This effect was
>> particularly marked for experienced developers - see their figure 7.
>>
>> https://arxiv.org/pdf/2601.20245
>>
>> But in general - my argument is that now is a good time to step back
>> and ask where we want AI to fit into the open-source world.  We
>> open-source developers tend to care a lot about copyright, and we
>> depend very greatly on the social aspects of coding, including our
>> ability to train the next generation of developers, in the particular
>> and informal way that we have learned.   We have much to lose from
>> careless use of AI.
>>
>
> E. S. Raymond is another recent convert.
>
> Programming with AI assistance is very revealing. It turns out I'm not quite 
> who I thought I was.
>
> There are a lot of programmers out there who have a tremendous amount of ego 
> and identity invested in the craft of coding. In knowing how to beat useful 
> and correct behavior out of one language and system environment, or better 
> yet many.
>
> If you asked me a week ago, I might have said I was one of those people. But 
> a curious thing has occurred. LLMs are so good now that I can validate and 
> generate a tremendous amount of code while doing hardly any hand-coding at 
> all.
>
> And it's dawning on me that I don't miss it.
>
> Things are moving fast.


Yes - but - it's important to separate how people feel using AI, and
the actual outcome.   Many of y'all will I am sure have seen this
study:

https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/

that showed that developers estimated they would get a 25% speedup
from AI, before they did the task; after they did the task, they felt
they they had got a 20% speedup, and in fact (compared to matched
tasks without AI), they suffered from a 20% slowdown.

Personally - I am not very egotistical about my code, but I am
extremely suspicious.   I know my tendency to become sloppy, to make
and miss mistakes - what David Donoho called "the ubiquity of error":
https://blog.nipy.org/ubiquity-of-error.html .   So AI makes me
increasingly uncomfortable, as I feel my skill starting to atrophy (in
the words of Andrej Karpathy quoted above).

So it seems to me we have to take someone like Linus Torvalds
seriously when he says he's "much less interested in AI for writing
code".   Perhaps it is possible, at some point, to show that
delegating coding to the AI leads to increased learning and greater
ability to spot error - but so far the evidence seems to go the other
way.   And if we "embrace" AI for that use, we run the risk of
deskilling ourselves, filling the code-base with maintenance debt,
effectively voiding copyright, and making it much harder to train the
next generation,

Cheers,

Matthew




--
This email is fully human-source.    Unless I'm quoting AI, I did not
use AI for any text in this email.
_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: [email protected]

[Numpy-discussion] Re: Current policy on AI-generated code in NumPy

Reply via email to