On Thu, May 7, 2026 at 6:02 AM pelzflorian (Florian Pelz)
<[email protected]> wrote:
>
> Greg Hogan <[email protected]> writes:
>
> > On Tue, May 5, 2026 at 5:45 PM pinoaffe <[email protected]> wrote:
> >>
> >> Greg Hogan <[email protected]> writes:
> >> > LLM output is licensable so your concerns are allayed.
> >>
> >> The jury is very much still out on that:
> >> - this has yet to be decided on by courts
> >> - even if one particular jurisdiction decides on this matter, there are
> >>   *many* more relevant jurisdictions
> >> - and even if suddenly all jurisdictions were to agree on some specific
> >>     copyright-status for LLM output, we as a community would *still*
> >>     need to decide whether we want to recognize that and incorporate it
> >
> > Yes, but I think we would want to retain our free software identity.
> > We don't want to sell our soul in opposition to a new technology.
> >
> >> There are many possible interpretations of the copyrightability of LLM
> >> output, a selection:
> >> - llm output is generally intellectual property of the user
> >> - llm output is generally intellectual property of the organisation 
> >> "hosting" the llm
> >> - llm output is generally public domain
> >> - llm output is neither anyones IP nor in the public domain
> >> - etc
> >
> > None of those options makes any difference to us. Present one case
> > where this is a problem for free software.
> >
> >> And even if llm output is generally thought to be licensable, this
> >> clearly cannot apply to any near-perfect copies of some part of its
> >> training data that it may randomly emit, so incorporating llm output
> >> into a GPL project would likely still be a legal risk
> >
> > This is not happening in 2026. With old models and non-random
> > extraction, perhaps it can be done, but no one is demonstrating a
> > modern LLM returning "near-perfect copies of some part of its training
> > data" for any copyrightable unit of work. Just as with crypto where
> > important research is done on weakened algorithms (reduced iterations)
> > the demonstrations of targeted extraction and fine-tuning is reducing
> > our risk as mitigations are developed and applied.
>
> The ongoing GEMA suit is an example where the LLM used to print near-verbatim
> song lyrics [1].  Generally, I remember Ekaitz’ suggestion in a mail
> from March [2] to add to the manual these words:

Again extractive prompting on older models.

And it should be noted that for future defense against copyright
claims about new works, AI is creating an even larger public domain
dataset. It is probable that in the near future nothing will be
copyrightable as everything will be derivative of some AI generated
creation.

> - If a significant portion of your contribution (i.e. beyond simple
>   autocomplete) was copied from somewhere else (i.e. AI, a website,
>   another software project...) you are required to disclose it in the PR
>   description.
> - If you cannot guarantee the provenance and legal safety of your code,
>   do not submit it.
>
> from [2].

To the first part I would only add to make this "copyrightable contribution".

The second part requiring a guarantee of legal safety prohibits any
contribution to our project.

> But my worry is that the agents (more than LLMs) obfuscate when they
> steal.  That people will not know when their LLM contribution to Guix is
> just a Scheme translation of other peoples’ copyrighted Rust code or
> written by clickworkers.

What is a clickworker? A work for hire?

> Even though LLMs clearly show some intelligence of their own when
> figuring out the LEAN Github code for Erdős problems referenced in [3],
> which then would clearly be usable public-domain code.
>
> Regards,
> Florian
>
> [1]
> https://en.wikipedia.org/wiki/Artificial_intelligence_and_copyright#GEMA_v._OpenAI,_Inc.
> [2]
> https://lists.gnu.org/archive/html/guix-devel/2026-03/msg00102.html
> [3]
> https://arxiv.org/abs/2601.07421

Reply via email to