Hi, This is just a plea for some careful thought at this point.
There are futures here that we likely don't want. For example, imagine Numpy filling up with large blocks of AI-generated code, and huge PRs that are effectively impossible for humans to review. As Oscar and Stefan have pointed out - consider what effect that is going to have on the social enterprise of open-source coding - and our ability to train new contributors. I believe we are also obliged to think hard about the consequences for copyright. We discussed that a bit here: https://github.com/matthew-brett/sp-ai-post/blob/main/notes.md In particular - there is no good way to ensure that the AI has not sucked in copyrighted code - even if you've asked it to do a simple port of other and clearly licensed code. There is some evidence that AI coding agents are, for whatever reason, particularly reluctant to point to GPL-licensing, when asked for code attribution. I don't think the argument that AI is inevitable is useful - yes, it's clear that AI will be part of coding in some sense, but we have yet to work out what part that will be. For example, there are different models of AI use - some of us are starting to generate large bodies of code with AI - such as Matthew Rocklin : https://matthewrocklin.com/ai-zealotry/ - but his discussion is useful. Here are two key quotes: * "LLMs generate a lot of junk" * "AI creates technical debt, but it can clean some of it up too. (at least at a certain granularity)" * "The code we write with AI probably won't be as good as hand-crafted code, but we'll write 10x more of it" https://matthewrocklin.com/ai-zealotry/ Another experienced engineer reflecting on his use of AI: """ ... LLM coding will split up engineers based on those who primarily liked coding and those who primarily liked building. Atrophy. I've already noticed that I am slowly starting to atrophy my ability to write code manually. Generation (writing code) and discrimination (reading code) are different capabilities in the brain. Largely due to all the little mostly syntactic details involved in programming, you can review code just fine even if you struggle to write it. """ https://x.com/karpathy/status/2015883857489522876 Conversely - Linus Torvalds has a different model of how AI should work: """ Torvalds said he's "much less interested in AI for writing code" and far more excited about "AI as the tool to help maintain code, including automated patch checking and code review before changes ever reach him." """ https://www.zdnet.com/article/linus-torvalds-ai-tool-maintaining-linux-code/ I guess y'all saw the recent Anthropic research paper comparing groups randomized to AI vs no-AI working on code problems. They found little speedup from AI, but a dramatic drop in the level of understanding of the library they were using (in fact this was Trio). This effect was particularly marked for experienced developers - see their figure 7. https://arxiv.org/pdf/2601.20245 But in general - my argument is that now is a good time to step back and ask where we want AI to fit into the open-source world. We open-source developers tend to care a lot about copyright, and we depend very greatly on the social aspects of coding, including our ability to train the next generation of developers, in the particular and informal way that we have learned. We have much to lose from careless use of AI. Cheers, Matthew
_______________________________________________ NumPy-Discussion mailing list -- [email protected] To unsubscribe send an email to [email protected] https://mail.python.org/mailman3//lists/numpy-discussion.python.org Member address: [email protected]
