Hi, On Thu, Feb 19, 2026 at 11:46 AM Ilhan Polat <[email protected]> wrote: > [...] > > Using LLM to find copyright violations is, with all respect, one of the most > ferrous irony I have seen lately. Did you check whether JAX and PyArrow claim > by the LLM is correct before you accuse the PR author, is there an actual > code resemblance confirmed by a human? (not blaming you obviously but I am > sure you see the recursion you are creating here)
Yes, as you can imagine, I thought about that problem - of using AI to detect copyright violations in AI. To me, that is only an irony if we are thinking of a binary - AI-good, AI-bad. If I think AI-bad, then I think that AI-generated contributions are bad, and therefore I must also think that using AI as a jumping off point for copyright assessment is bad. However, AI-bad is not what I think. I do think (is this controversial?) that AI is unreliable, that, in typical use, without careful discipline, it will tend to reduce learning and understanding compared to doing the same task without AI, and that it can be useful, if we take those things into account. Then I was thinking about the question that Evgeni (on the Scientific Python Discourse forum) and Robert had asked - which is - fine, copyright is an issue, but how can we reasonably ask the contributor to assess that? That's a serious and difficult question. One option is to throw up one's hands and say - OK - copyright is dead - let's ignore it, or at least, deemphasise it. I don't think that's the right answer, which leaves me with the urgent problem of how to proceed. Because this question is difficult, and it is very new (in the sense it has now become very easy for good-faith submissions to violate copyright) - it seems to me we will have to iterate. Then I asked myself - if I had to start somewhere - how would I approach that problem? The way I tend to use AI, is as a jumping off point - a starting point for a discussion with the AI. Quite often, as in this case, that jumping off point is misleading or flat-out wrong - but if you know that (are there any experienced users of AI who don't know that?) - then you can start to negotiate with the AI, and you will often, if you are careful, negotiate to something that you can verify from reliable sources. You may have seen me taking that (I assume standard) approach in my negotiations with Gemini in a previous conversation about copyright, that I linked to as a Gist. Now, this is a new world we're in. I'm not saying that's a practical approach for contributors to explore copyright. I think that I could use it that way, and that I'd get closer to a reliable answer than if I had not used it (and got no answer). I suspect, if we trust our contributors, we will find we and they do develop good habits for that use. But it's a genuinely open question whether that is so. As I keep saying, my intention was only to raise the idea as a starting point. And given the nature of AI - I therefore had to run the risk that the relevant quoted AI (from a simple prompt and response) would be misleading or wrong. Cheers, Matthew _______________________________________________ NumPy-Discussion mailing list -- [email protected] To unsubscribe send an email to [email protected] https://mail.python.org/mailman3//lists/numpy-discussion.python.org Member address: [email protected]
