Nguyễn Gia Phong <[email protected]> writes: > Hello Guix,
Thank you Nguyễn for holding high free software. And mailing to us all. > about the scope of this discussion. This is about software freedom, > not other criticisms towards LLMs such as code quality, climate impact, > oligopoly, market manipulation, or digital ~nationalism~ sovereignty. Yes, limiting scope is smart. > > I also assume we share the following axiomata. > > 1. No existing LLM limits its training data > to works belong to the public domain. > 2. LLMs may leak their training data, outputing verbatim copies > of their training materials. > 3. From around 15 lines of code/text is eligible for copyright [2]. > 4. We do not take upstreams' copyright claims for granted. Agreed for large Large Language Models. > Relicensing works whose copyright one does not hold is not only illegal > but can also take away users freedom when > > - Attribution is removed, users would not be able to locate > the original's parent work to exercise their software freedom, or > - Copyleft is removed, the derivitive's parent work is intended > to be part of a ~open-core~ proprietary project. Yes, as you say: > > On 2026-02-04 at 09:06Z, Florian Pelz wrote: >> if they keep up the rsyslog way of working, [commit 262b22d82a], >> likely they will never be shoplifting significant portions of code. > > [commit 262b22d82a]: > https://github.com/rsyslog/rsyslog/commit/262b22d82a40811ee14ed2cc3ff930d8eb45c9d4 > > This commit include multiple hunks exceeding 15 LoC. > If they were (modifications of) LLMs' output, this will > be license laundery (see axiomata 1, 2, and 3). > > On 2026-02-04 at 09:06Z, Florian Pelz wrote: >> We should not shun them just because maybe. > > There are different kinds of maybes. Baseless suspicion is one thing, > rsyslog openly admitting to use LLMs' output is clearly another. > The uncertainty here is about which snippets must be rewritten > to revert the copyright violations, not if they did something wrong. Well, here I believe no snippet in rsyslog is in violation, even though copying some LLM answers (or their trained data) are violations. To support this, we should look at what courts of law decide. Which for source code has not happened, I believe. But I will watch Wikipedia entry https://en.wikipedia.org/wiki/Artificial_intelligence_and_copyright I also thought Free Software Foundation said such violations are rare, but apparently I misremembered; their stance is not yet clear? > I suggest we take the reactive approach: if a package become known to us > to be bad (violating copyright, containing backdoor, failing to build) > then we either make it good or remove it. Certainly, reactive this is the minimum of legal obligation. And reactive is all we should be for now. > Even if it makes zero difference in practice, contributors and users > (myself included) would like to know if Guix welcome and redistribute > license-laundered works. We only disagree on if license laundering happens in Guix packages’ practice. (And perhaps in LLM-aided Guix contributions.) Guix’ packages are not vibe coding. So far at least. I believe. Except maybe some packages quickly imported from CRAN etc and not checked carefully by accident. Regards, Florian
