On 2026-02-20 14:39, Jonathan Dowland wrote: > An LLM which was solely trained on a corpus of free software with intra- > compatible licensing (for the sake of this example say, GPL2 or later, > and anything compatible with it), such that we declare the resulting > weightings to be a derivative, licensed GPL2+, and attribute the > authorship to the union of authorship of *all* the inputs, and consider > anything it outputs to be a derivative, likewise GPL2+. Would that be > acceptable? Would that be useful? I doubt so, for three reasons:
One, I don't think we have the standing to *on principle* decide that all outputs are derivative, in the legal sense of the term. That will be decided by courts and/or legislation. And the way it looks right now is that outputs are not considered copyrightable at all. Even just the training is a contentious issue on which rules and legislation-in-development point in favor of it. (I emphasized "on principle" because of the special case where outputs are just reproductions of copyrighted materials.) Two, there are plenty of output categories to which we couldn't reasonably claim derivative status. My go-to example would be using an LLM to fix grammar and spelling in a text, or an LLM auto-completing for-loops or similar trivial stuff. Three, I don't think anyone can simply assert copyright of a gazillion authors for any given output. There has to be some meaningful relationship between the inputs for which authorship is being claimed, and the outputs. Imagine how it is today: If a human writes bar.c and some author claims this to be a derivative (in terms of copyright law) of their foo.c, then this claim would be tested by a court based on the contents of each file. Now why would this be different if bar.c had been created by an LLM. Best, Christian

