On Tue, May 5, 2026 at 5:45 PM pinoaffe <[email protected]> wrote: > > Greg Hogan <[email protected]> writes: > > LLM output is licensable so your concerns are allayed. > > The jury is very much still out on that: > - this has yet to be decided on by courts > - even if one particular jurisdiction decides on this matter, there are > *many* more relevant jurisdictions > - and even if suddenly all jurisdictions were to agree on some specific > copyright-status for LLM output, we as a community would *still* > need to decide whether we want to recognize that and incorporate it
Yes, but I think we would want to retain our free software identity. We don't want to sell our soul in opposition to a new technology. > There are many possible interpretations of the copyrightability of LLM > output, a selection: > - llm output is generally intellectual property of the user > - llm output is generally intellectual property of the organisation "hosting" > the llm > - llm output is generally public domain > - llm output is neither anyones IP nor in the public domain > - etc None of those options makes any difference to us. Present one case where this is a problem for free software. > And even if llm output is generally thought to be licensable, this > clearly cannot apply to any near-perfect copies of some part of its > training data that it may randomly emit, so incorporating llm output > into a GPL project would likely still be a legal risk This is not happening in 2026. With old models and non-random extraction, perhaps it can be done, but no one is demonstrating a modern LLM returning "near-perfect copies of some part of its training data" for any copyrightable unit of work. Just as with crypto where important research is done on weakened algorithms (reduced iterations) the demonstrations of targeted extraction and fine-tuning is reducing our risk as mitigations are developed and applied.
