Re: Including LLM output in Guix

Greg Hogan Tue, 05 May 2026 12:05:27 -0700

On Tue, May 5, 2026 at 12:26 PM Nguyễn Gia Phong <[email protected]> wrote:
>
> Hi Greg,
>
> On 2026-05-05 at 11:05-04:00, Greg Hogan wrote:
> > On Tue, May 5, 2026 at 2:40 AM Nguyễn Gia Phong wrote:
> > [...]
> > > So yes, Oracle doesn't like contributions already in the public domain
> > > (so it can't sue other parties for infringements), but it's also
> > > not thrilled about infringing copyrights either (so it can be sued
> > > by other parties).
> > >
> > > I think the latter might apply to us.
> >
> > This has always been a risk accepting contributions. A developer could
> > copy and/or modify example code off Stack Overflow and fail to
> > properly attribute per CC BY-SA.
>
> Indeed, though it'd be put more aptly as
>
> > A developer could copy and/or modify example code off LLM
> > and fail check it is (near-)verbatim of the model's training data.

I would be interested to see an analysis of incidental memoization
rather than the claims of extractive memoization typically presented.
And taking into consideration the age of the models.

> The issue is that such check is impossible to carry out,
> given the vast volume of the training data
> as well as the legality of obtaining them.

Legal or not we won't have access to the new training data which will
increasingly be the private user interactions.

> Considering only copyright, I think it'd help to consider LLM output
> to be similar to a loose page you found on the street.  It might
> be from a book under copyright, or it might be really old
> and have entered the public domain, who knows, so it's prolly unwise
> to redistribute it.

True, but you could transform that snippet into a full new story as
ideas are not copyrightable.

> On 2026-05-05 at 11:05-04:00, Greg Hogan wrote:
> > The risk to our project is mitigated in that most Guix contributions
> > are not copyrightable "factual" updates for versions, checksums, and
> > applying patches.
>
> Agreed, and these updates are not blocked by the lack of patches,
> but their reviewers.  (IMHO LLM cannot meaningfully participating
> in opening an editor and changing version and checksum strings,
> or downloading a patch file anyway.)

I don't understand what you mean by "LLM cannot meaningfully ...". Do
you mean that those actions are so simple as to not be a meaningful
contribution? I would be very interested to see an LLM attempt to
update Guix packages. Not simple leaf packages but core tools or
libraries with hundreds or thousands of dependent packages, many
requiring version updates or pinning or patches from upstream sources.
The Guix package set is more assembled than built.

Re: Including LLM output in Guix

Reply via email to