[Numpy-discussion] Re: Current policy on AI-generated code in NumPy

Matthew Brett via NumPy-Discussion Sat, 07 Feb 2026 06:01:13 -0800

Hi,

This is just a plea for some careful thought at this point.


There are futures here that we likely don't want.  For example,
imagine Numpy filling up with large blocks of AI-generated code, and
huge PRs that are effectively impossible for humans to review.   As
Oscar and Stefan have pointed out - consider what effect that is going
to have on the social enterprise of open-source coding - and our
ability to train new contributors.

I believe we are also obliged to think hard about the consequences for
copyright.   We discussed that a bit here:

https://github.com/matthew-brett/sp-ai-post/blob/main/notes.md

In particular - there is no good way to ensure that the AI has not
sucked in copyrighted code - even if you've asked it to do a simple
port of other and clearly licensed code.  There is some evidence that
AI coding agents are, for whatever reason, particularly reluctant to
point to GPL-licensing, when asked for code attribution.

I don't think the argument that AI is inevitable is useful - yes, it's
clear that AI will be part of coding in some sense, but we have yet to
work out what part that will be.

For example, there are different models of AI use - some of us are
starting to generate large bodies of code with AI - such as Matthew
Rocklin : https://matthewrocklin.com/ai-zealotry/ - but his discussion
is useful.  Here are two key quotes:

* "LLMs generate a lot of junk"
* "AI creates technical debt, but it can clean some of it up too. (at
least at a certain granularity)"
* "The code we write with AI probably won't be as good as hand-crafted
code, but we'll write 10x more of it"

https://matthewrocklin.com/ai-zealotry/

Another experienced engineer reflecting on his use of AI:

""" ...  LLM coding will split up engineers based on those who
primarily liked coding and those who primarily liked building.

Atrophy. I've already noticed that I am slowly starting to atrophy my
ability to write code manually. Generation (writing code) and
discrimination (reading code) are different capabilities in the brain.
Largely due to all the little mostly syntactic details involved in
programming, you can review code just fine even if you struggle to
write it.
"""

https://x.com/karpathy/status/2015883857489522876

Conversely - Linus Torvalds has a different model of how AI should work:

"""
Torvalds said he's "much less interested in AI for writing code" and
far more excited about "AI as the tool to help maintain code,
including automated patch checking and code review before changes ever
reach him."
"""

https://www.zdnet.com/article/linus-torvalds-ai-tool-maintaining-linux-code/

I guess y'all saw the recent Anthropic research paper comparing groups
randomized to AI vs no-AI working on code problems.  They found little
speedup from AI, but a dramatic drop in the level of understanding of
the library they were using (in fact this was Trio).   This effect was
particularly marked for experienced developers - see their figure 7.

https://arxiv.org/pdf/2601.20245

But in general - my argument is that now is a good time to step back
and ask where we want AI to fit into the open-source world.  We
open-source developers tend to care a lot about copyright, and we
depend very greatly on the social aspects of coding, including our
ability to train the next generation of developers, in the particular
and informal way that we have learned.   We have much to lose from
careless use of AI.

Cheers,

Matthew

_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: [email protected]

[Numpy-discussion] Re: Current policy on AI-generated code in NumPy

Reply via email to