[Numpy-discussion] Re: NumPy-Discussion Digest, Vol 233, Issue 13

David Nicholson via NumPy-Discussion Sat, 07 Feb 2026 19:11:18 -0800

re: Matthew Brett's comments about copyright, and his write-up with Paul
Ivanov, I want to share a couple of posts from Matthew Butterick, and a
related lawsuit:


https://matthewbutterick.com/chron/this-copilot-is-stupid-and-wants-to-kill-me.html
https://githubcopilotinvestigation.com/
https://githubcopilotlitigation.com/

I'm guessing some people will be familiar with these, but if you're not, I
hope you'll read and consider the arguments.
Perhaps most relevant (from the "investigation" post):
>  Just this week, Texas A&M professor Tim Davis
<https://people.engr.tamu.edu/davis/welcome.html> gave numerous examples
<https://twitter.com/DocSparse/status/1581461734665367554?cxt=HHwWhMDRibPEvfIrAAAA>
of large chunks of his code being copied verbatim by Copilot, including
when he prompted Copilot with the comment
<https://twitter.com/DocSparse/status/1581637250927906816> /* sparse matrix
transpose in the style of Tim Davis */.

So clearly coding LLMs memorize, but as currently engineered, most do not
provide citations or links to licenses.
(They *could
<https://madoc.bib.uni-mannheim.de/71146/1/heiBOOKS-1652-978-3-911056-51-9-CH30-2.pdf>*,
but AFAIK they don't. Butterick points this out also.) Previous work
suggests it's easy to extract code, but hard to extract authorship (e.g.,
https://arxiv.org/pdf/2012.07805, page 10), and that larger models tend to
memorize more (https://dl.acm.org/doi/pdf/10.1145/3597503.3639074).
Likewise "most LLMs fail to provide accurate license information,
particularly for code under copyleft licenses" (
https://arxiv.org/abs/2408.02487v1),

The example of the prompt "matrix code in the style of Tim Davis" shows
that it's not as simple as "more examples in training data = more
memorization". For scientific software, the number of examples in the
training data will always be <<< the number of examples of, say, CRUD apps.
My guess is that, with a very specific prompt, if you try to generate
scientific code, you will be much more likely to violate an OS license.
(One could test this, of course. Probably with an approach like this:
https://arxiv.org/abs/2601.02671)

I don't want to be a random person getting sanctimonious on the mailing
list.
But I really value the ethics of open source software, the amazing
contributions of all the numpy developers, and of scientist-coders more
broadly.
I get the appeal of coding LLMs. *And* I agree with Butterick that, as
currently designed, they break the ethical compact of open source.
I would hate to see numpy and the ecosystem around it move in that
direction.

David Nicholson, Ph.D.
https://nicholdav.info/
https://github.com/NickleDave


On Sat, Feb 7, 2026 at 8:44 PM <[email protected]> wrote:

> Send NumPy-Discussion mailing list submissions to
>         [email protected]
>
> To subscribe or unsubscribe via the World Wide Web, visit
>
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> or, via email, send a message with subject or body 'help' to
>         [email protected]
>
> You can reach the person managing the list at
>         [email protected]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of NumPy-Discussion digest..."
>
> Today's Topics:
>
>    1. Re: Current policy on AI-generated code in NumPy (Evgeni Burovski)
>    2. Re: Current policy on AI-generated code in NumPy (Matthew Brett)
>    3. Re: Current policy on AI-generated code in NumPy
>       (David Cournapeau)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sat, 7 Feb 2026 18:25:46 +0100
> From: Evgeni Burovski <[email protected]>
> Subject: [Numpy-discussion] Re: Current policy on AI-generated code in
>         NumPy
> To: Discussion of Numerical Python <[email protected]>
> Message-ID:
>         <CAMRo0ivQxZbjiZRtB_2iBTEoi8=GkG+29iL9JEk2aDXcw7=
> [email protected]>
> Content-Type: multipart/alternative;
>         boundary="000000000000e7c8a0064a3f32a2"
>
> > Honestly now I find it reassuring to see broken English, typos, lazy
> markdown formatting, grammatical errors and so on because it is so
> much better that I am talking to a real human. I think most people
> using LLMs to write comments literally don't understand this and often
> just need to be told.
>
> (An ESL here). By all means do write it in the docs, and copy-paste it (or
> have a bot copy-paste it even) as replies to suspected AI written comments.
> The barrier is real, an easy "solution" is readily available and is
> becoming ubiquitous, and the sentiment is very much not obvious.
>
>
> On Sat, Feb 7, 2026 at 12:10 AM Oscar Benjamin via NumPy-Discussion <
> [email protected]> wrote:
>
> > On Fri, 6 Feb 2026 at 22:44, Andrew Nelson via NumPy-Discussion
> > <[email protected]> wrote:
> > >
> > > Something we're also seeing is AI being used to draft comments in PRs.
> I
> > think this is understandable as English is not a first language for most
> > people. However, it also has the effect of raising suspicions (rightly or
> > wrongly) as to whether the code changes were produced by AI as well.
> >
> > I actually think that this is a bigger problem than people using AI to
> > write code. If all the code is written by AI (and it will be) then
> > human-to-human communication is the way to build trust. Allowing AIs
> > to poison that breaks everything.
> >
> > Honestly now I find it reassuring to see broken English, typos, lazy
> > markdown formatting, grammatical errors and so on because it is so
> > much better that I am talking to a real human. I think most people
> > using LLMs to write comments literally don't understand this and often
> > just need to be told.
> >
> > --
> > Oscar
> > _______________________________________________
> > NumPy-Discussion mailing list -- [email protected]
> > To unsubscribe send an email to [email protected]
> > https://mail.python.org/mailman3//lists/numpy-discussion.python.org
> > Member address: [email protected]
> >
> -------------- next part --------------
> A message part incompatible with plain text digests has been removed ...
> Name: not available
> Type: text/html
> Size: 2793 bytes
> Desc: not available
>
> ------------------------------
>
> Message: 2
> Date: Sat, 7 Feb 2026 17:49:14 +0000
> From: Matthew Brett <[email protected]>
> Subject: [Numpy-discussion] Re: Current policy on AI-generated code in
>         NumPy
> To: Charles R Harris <[email protected]>
> Cc: Discussion of Numerical Python <[email protected]>
> Message-ID:
>         <
> cah6pt5rg8pd_pqctandz38mo++hlqkvwdtzqck4f5sqxfsb...@mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> Hi
>
> On Sat, Feb 7, 2026 at 4:54 PM Charles R Harris
> <[email protected]> wrote:
> >
> >
> >
> > On Sat, Feb 7, 2026 at 7:05 AM Matthew Brett via NumPy-Discussion <
> [email protected]> wrote:
> >>
> >> Hi,
> >>
> >> This is just a plea for some careful thought at this point.
> >>
> >> There are futures here that we likely don't want.  For example,
> >> imagine Numpy filling up with large blocks of AI-generated code, and
> >> huge PRs that are effectively impossible for humans to review.   As
> >> Oscar and Stefan have pointed out - consider what effect that is going
> >> to have on the social enterprise of open-source coding - and our
> >> ability to train new contributors.
> >>
> >> I believe we are also obliged to think hard about the consequences for
> >> copyright.   We discussed that a bit here:
> >>
> >> https://github.com/matthew-brett/sp-ai-post/blob/main/notes.md
> >>
> >> In particular - there is no good way to ensure that the AI has not
> >> sucked in copyrighted code - even if you've asked it to do a simple
> >> port of other and clearly licensed code.  There is some evidence that
> >> AI coding agents are, for whatever reason, particularly reluctant to
> >> point to GPL-licensing, when asked for code attribution.
> >>
> >> I don't think the argument that AI is inevitable is useful - yes, it's
> >> clear that AI will be part of coding in some sense, but we have yet to
> >> work out what part that will be.
> >>
> >> For example, there are different models of AI use - some of us are
> >> starting to generate large bodies of code with AI - such as Matthew
> >> Rocklin : https://matthewrocklin.com/ai-zealotry/ - but his discussion
> >> is useful.  Here are two key quotes:
> >>
> >> * "LLMs generate a lot of junk"
> >> * "AI creates technical debt, but it can clean some of it up too. (at
> >> least at a certain granularity)"
> >> * "The code we write with AI probably won't be as good as hand-crafted
> >> code, but we'll write 10x more of it"
> >>
> >> https://matthewrocklin.com/ai-zealotry/
> >>
> >> Another experienced engineer reflecting on his use of AI:
> >>
> >> """ ...  LLM coding will split up engineers based on those who
> >> primarily liked coding and those who primarily liked building.
> >>
> >> Atrophy. I've already noticed that I am slowly starting to atrophy my
> >> ability to write code manually. Generation (writing code) and
> >> discrimination (reading code) are different capabilities in the brain.
> >> Largely due to all the little mostly syntactic details involved in
> >> programming, you can review code just fine even if you struggle to
> >> write it.
> >> """
> >>
> >> https://x.com/karpathy/status/2015883857489522876
> >>
> >> Conversely - Linus Torvalds has a different model of how AI should work:
> >>
> >> """
> >> Torvalds said he's "much less interested in AI for writing code" and
> >> far more excited about "AI as the tool to help maintain code,
> >> including automated patch checking and code review before changes ever
> >> reach him."
> >> """
> >>
> >>
> https://www.zdnet.com/article/linus-torvalds-ai-tool-maintaining-linux-code/
> >>
> >> I guess y'all saw the recent Anthropic research paper comparing groups
> >> randomized to AI vs no-AI working on code problems.  They found little
> >> speedup from AI, but a dramatic drop in the level of understanding of
> >> the library they were using (in fact this was Trio).   This effect was
> >> particularly marked for experienced developers - see their figure 7.
> >>
> >> https://arxiv.org/pdf/2601.20245
> >>
> >> But in general - my argument is that now is a good time to step back
> >> and ask where we want AI to fit into the open-source world.  We
> >> open-source developers tend to care a lot about copyright, and we
> >> depend very greatly on the social aspects of coding, including our
> >> ability to train the next generation of developers, in the particular
> >> and informal way that we have learned.   We have much to lose from
> >> careless use of AI.
> >>
> >
> > E. S. Raymond is another recent convert.
> >
> > Programming with AI assistance is very revealing. It turns out I'm not
> quite who I thought I was.
> >
> > There are a lot of programmers out there who have a tremendous amount of
> ego and identity invested in the craft of coding. In knowing how to beat
> useful and correct behavior out of one language and system environment, or
> better yet many.
> >
> > If you asked me a week ago, I might have said I was one of those people.
> But a curious thing has occurred. LLMs are so good now that I can validate
> and generate a tremendous amount of code while doing hardly any hand-coding
> at all.
> >
> > And it's dawning on me that I don't miss it.
> >
> > Things are moving fast.
>
> Yes - but - it's important to separate how people feel using AI, and
> the actual outcome.   Many of y'all will I am sure have seen this
> study:
>
> https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/
>
> that showed that developers estimated they would get a 25% speedup
> from AI, before they did the task; after they did the task, they felt
> they they had got a 20% speedup, and in fact (compared to matched
> tasks without AI), they suffered from a 20% slowdown.
>
> Personally - I am not very egotistical about my code, but I am
> extremely suspicious.   I know my tendency to become sloppy, to make
> and miss mistakes - what David Donoho called "the ubiquity of error":
> https://blog.nipy.org/ubiquity-of-error.html .   So AI makes me
> increasingly uncomfortable, as I feel my skill starting to atrophy (in
> the words of Andrej Karpathy quoted above).
>
> So it seems to me we have to take someone like Linus Torvalds
> seriously when he says he's "much less interested in AI for writing
> code".   Perhaps it is possible, at some point, to show that
> delegating coding to the AI leads to increased learning and greater
> ability to spot error - but so far the evidence seems to go the other
> way.   And if we "embrace" AI for that use, we run the risk of
> deskilling ourselves, filling the code-base with maintenance debt,
> effectively voiding copyright, and making it much harder to train the
> next generation,
>
> Cheers,
>
> Matthew
>
>
>
>
> --
> This email is fully human-source.    Unless I'm quoting AI, I did not
> use AI for any text in this email.
>
> ------------------------------
>
> Message: 3
> Date: Sun, 8 Feb 2026 10:43:29 +0900
> From: David Cournapeau <[email protected]>
> Subject: [Numpy-discussion] Re: Current policy on AI-generated code in
>         NumPy
> To: Discussion of Numerical Python <[email protected]>
> Message-ID:
>         <CAGY4rcW0urRE0ATTnviUG4BT7+88YfgLH40V_=
> [email protected]>
> Content-Type: multipart/alternative;
>         boundary="000000000000ef04df064a4626f3"
>
> Hi Matt,
>
> There are two aspects: can we use AI-generated code in numpy/scipy, and if
> we can should we ? And to make it more complicated, the type of AI usage
> affects those questions differently. E.g. I think almost nobody would
> object to the use I described originally: using chats to research, analyze
> literature and understand existing codebases under acceptable license.
> There is no code generated there. Another extreme is all code generated and
> reviewed by AI.
>
> I will for now continue my original approach (no AI to generate any code
> unless trivial + disclose its use when PR time comes).
>
> David
>
> On Sun, Feb 8, 2026 at 2:52 AM Matthew Brett via NumPy-Discussion <
> [email protected]> wrote:
>
> > Hi
> >
> > On Sat, Feb 7, 2026 at 4:54 PM Charles R Harris
> > <[email protected]> wrote:
> > >
> > >
> > >
> > > On Sat, Feb 7, 2026 at 7:05 AM Matthew Brett via NumPy-Discussion <
> > [email protected]> wrote:
> > >>
> > >> Hi,
> > >>
> > >> This is just a plea for some careful thought at this point.
> > >>
> > >> There are futures here that we likely don't want.  For example,
> > >> imagine Numpy filling up with large blocks of AI-generated code, and
> > >> huge PRs that are effectively impossible for humans to review.   As
> > >> Oscar and Stefan have pointed out - consider what effect that is going
> > >> to have on the social enterprise of open-source coding - and our
> > >> ability to train new contributors.
> > >>
> > >> I believe we are also obliged to think hard about the consequences for
> > >> copyright.   We discussed that a bit here:
> > >>
> > >> https://github.com/matthew-brett/sp-ai-post/blob/main/notes.md
> > >>
> > >> In particular - there is no good way to ensure that the AI has not
> > >> sucked in copyrighted code - even if you've asked it to do a simple
> > >> port of other and clearly licensed code.  There is some evidence that
> > >> AI coding agents are, for whatever reason, particularly reluctant to
> > >> point to GPL-licensing, when asked for code attribution.
> > >>
> > >> I don't think the argument that AI is inevitable is useful - yes, it's
> > >> clear that AI will be part of coding in some sense, but we have yet to
> > >> work out what part that will be.
> > >>
> > >> For example, there are different models of AI use - some of us are
> > >> starting to generate large bodies of code with AI - such as Matthew
> > >> Rocklin : https://matthewrocklin.com/ai-zealotry/ - but his
> discussion
> > >> is useful.  Here are two key quotes:
> > >>
> > >> * "LLMs generate a lot of junk"
> > >> * "AI creates technical debt, but it can clean some of it up too. (at
> > >> least at a certain granularity)"
> > >> * "The code we write with AI probably won't be as good as hand-crafted
> > >> code, but we'll write 10x more of it"
> > >>
> > >> https://matthewrocklin.com/ai-zealotry/
> > >>
> > >> Another experienced engineer reflecting on his use of AI:
> > >>
> > >> """ ...  LLM coding will split up engineers based on those who
> > >> primarily liked coding and those who primarily liked building.
> > >>
> > >> Atrophy. I've already noticed that I am slowly starting to atrophy my
> > >> ability to write code manually. Generation (writing code) and
> > >> discrimination (reading code) are different capabilities in the brain.
> > >> Largely due to all the little mostly syntactic details involved in
> > >> programming, you can review code just fine even if you struggle to
> > >> write it.
> > >> """
> > >>
> > >> https://x.com/karpathy/status/2015883857489522876
> > >>
> > >> Conversely - Linus Torvalds has a different model of how AI should
> work:
> > >>
> > >> """
> > >> Torvalds said he's "much less interested in AI for writing code" and
> > >> far more excited about "AI as the tool to help maintain code,
> > >> including automated patch checking and code review before changes ever
> > >> reach him."
> > >> """
> > >>
> > >>
> >
> https://www.zdnet.com/article/linus-torvalds-ai-tool-maintaining-linux-code/
> > >>
> > >> I guess y'all saw the recent Anthropic research paper comparing groups
> > >> randomized to AI vs no-AI working on code problems.  They found little
> > >> speedup from AI, but a dramatic drop in the level of understanding of
> > >> the library they were using (in fact this was Trio).   This effect was
> > >> particularly marked for experienced developers - see their figure 7.
> > >>
> > >> https://arxiv.org/pdf/2601.20245
> > >>
> > >> But in general - my argument is that now is a good time to step back
> > >> and ask where we want AI to fit into the open-source world.  We
> > >> open-source developers tend to care a lot about copyright, and we
> > >> depend very greatly on the social aspects of coding, including our
> > >> ability to train the next generation of developers, in the particular
> > >> and informal way that we have learned.   We have much to lose from
> > >> careless use of AI.
> > >>
> > >
> > > E. S. Raymond is another recent convert.
> > >
> > > Programming with AI assistance is very revealing. It turns out I'm not
> > quite who I thought I was.
> > >
> > > There are a lot of programmers out there who have a tremendous amount
> of
> > ego and identity invested in the craft of coding. In knowing how to beat
> > useful and correct behavior out of one language and system environment,
> or
> > better yet many.
> > >
> > > If you asked me a week ago, I might have said I was one of those
> people.
> > But a curious thing has occurred. LLMs are so good now that I can
> validate
> > and generate a tremendous amount of code while doing hardly any
> hand-coding
> > at all.
> > >
> > > And it's dawning on me that I don't miss it.
> > >
> > > Things are moving fast.
> >
> > Yes - but - it's important to separate how people feel using AI, and
> > the actual outcome.   Many of y'all will I am sure have seen this
> > study:
> >
> > https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/
> >
> > that showed that developers estimated they would get a 25% speedup
> > from AI, before they did the task; after they did the task, they felt
> > they they had got a 20% speedup, and in fact (compared to matched
> > tasks without AI), they suffered from a 20% slowdown.
> >
> > Personally - I am not very egotistical about my code, but I am
> > extremely suspicious.   I know my tendency to become sloppy, to make
> > and miss mistakes - what David Donoho called "the ubiquity of error":
> > https://blog.nipy.org/ubiquity-of-error.html .   So AI makes me
> > increasingly uncomfortable, as I feel my skill starting to atrophy (in
> > the words of Andrej Karpathy quoted above).
> >
> > So it seems to me we have to take someone like Linus Torvalds
> > seriously when he says he's "much less interested in AI for writing
> > code".   Perhaps it is possible, at some point, to show that
> > delegating coding to the AI leads to increased learning and greater
> > ability to spot error - but so far the evidence seems to go the other
> > way.   And if we "embrace" AI for that use, we run the risk of
> > deskilling ourselves, filling the code-base with maintenance debt,
> > effectively voiding copyright, and making it much harder to train the
> > next generation,
> >
> > Cheers,
> >
> > Matthew
> >
> >
> >
> >
> > --
> > This email is fully human-source.    Unless I'm quoting AI, I did not
> > use AI for any text in this email.
> > _______________________________________________
> > NumPy-Discussion mailing list -- [email protected]
> > To unsubscribe send an email to [email protected]
> > https://mail.python.org/mailman3//lists/numpy-discussion.python.org
> > Member address: [email protected]
> >
> -------------- next part --------------
> A message part incompatible with plain text digests has been removed ...
> Name: not available
> Type: text/html
> Size: 10072 bytes
> Desc: not available
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> NumPy-Discussion mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> https://mail.python.org/mailman3//lists/numpy-discussion.python.org
>
>
> ------------------------------
>
> End of NumPy-Discussion Digest, Vol 233, Issue 13
> *************************************************
>

_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: [email protected]

[Numpy-discussion] Re: NumPy-Discussion Digest, Vol 233, Issue 13

Reply via email to