Hi, I think we really need to not make rules on "AI" itself (which isn't something clearly defined) or LLMs (still not clearly defined enough) but rather decide what we really want and stand for.
So we could cite specific usage of LLMs (with specific LLMs) as an example or clarification of things that we refuse or rules that *already exist* or that we create or modify. Why --- The danger of going directly against AI and not be against its negatives consequences directly is that: - It would increase confrontation because the rules are not clear anymore. Nonfree software is already forbidden. As an example, allowing any code generated by LLMs under the limit of 15 lines would allow contributors to infringe rules we already have and put the project at risk (why at the end). In addition people could argue that some "AI" systems are good but if we instead insist on the rules we already have, this argument becomes irrelevant (see below in the section about rules). Having good rules that are precise enough to do what we want is extremely important I think because otherwise we risk doing the exact opposite of what we want (this will be more clear below). Rules that already do exist and possible additional rules --------------------------------------------------------- We already have very clear rules that are either implicit (because they are extremely obvious) or explicit, for instance: - The patches people provide to Guix are supposed to be under a free license and legal in several jurisdictions. Linux has a DCO for that so we don't already have something similar, we probably need to get some GNU DCO or Guix DCO, or reuses the Linux one to make sure that the Guix contributor can be held responsible in cases of (legal) issues. - The same applies for packages the substitute servers do ship. If some generated code has unknown copyright status (because it's generated by LLM or because we don't know if the files are legal, like in the case of the files we remove in the sdcc package), I don't think that arguing for keeping software illegal under copyright law is going to work here. The solution has always been to remove these files like that (example taken from sdcc): > (snippet #~(begin > ;; Remove non-free source files. > (delete-file-recursively "device/non-free"))) And even if free software has not always respected all the laws in all jurisdictions, and probably never will (free software stands against DRMs for instance) it usually respected copyright laws pretty well, and it was badly damaged by the SCO vs IBM lawsuit (because the collateral damages were huge for the FLOSS community at large). This is why DCO and similar process were adopted by free software. Another example was the discussions about the case of the ZFS kernel module that cannot be redistributed once it is compiled and Guix has decided to not ship that module in a compiled form (we lacked any analysis in the case of not-compiled source code). - Free software LLMs probably exist (like kaldi) but then it also raises the question of the cost for Guix. If in the future the cost is small enough, Guix would therefor have to train the models like it compiles software. This would also need to be made reproducible (to have the model be functionally equivalent if it is retrained, because even non-reproductible software is functionally equivalent when recompiled), to be understandable by humans, etc. For instance Guix is responsible for the security, it has to take decisions on maintenance, we need to ensure that packages stay free, etc, so we need to be able to understand the software we package. All that is impossible if we have obfuscated source code for instance (and in the case of obfuscated source, it's pretty clear that it's not considered as source code as it is not the preferred form of modification). - The FSDG already provides ways to exclude third party repositories that contain nonfree software (like models under nonfree licenses). Many of these slipped in and I think we should consider that as a bug and work toward a resolution, step by step, with the (limited) resources we have. > This proposal takes a clear stance that not everyone may agree with. > This could lead to fragmentation within the Guix community, or within > the free software community. I think it's the other way around: violating all the rules above would lead to fragmentation, infightings, etc. After all Guix is free software and its meant to package free software, so I don't see why, just because something like an LLM looks powerful, we should compromise our principles and stop curating packages to make sure that they do respect users's freedom, that makes no sense. Even the FSF is going in that direction (reference: the FSF talks on LLMs etc 2 years ago at the FOSDEM). Many other distributions do really have to take stance against AI precisely because they also want to package nonfree software, so many of the rules above don't apply to them. In our case I think we need an extra rule here: we should be able to refuse packaging software that puts the Guix contributors and/or users at risk of infighting. An example is software that is maintained by people that do oppress its contributors (an example here is probably Xlibre and we discussed that on the mailing list already). In that discussion, if I recall well, Efraim pointed out that Debian has a rule like that and I think that it makes sense to adopt a similar rule. This would allow to not package anything that look like "AI". At the end nothing from the "AI" look particularly special: the rules we have are good and they are so good that they mostly apply to "AI" (we just need a DCO and to be able to refuse to package controversial software to not divide the community). At the end of the day if LLMs with huge costs or with nonfree models are somewhat useful to people, then I think the way to go would be to have nonguix package them and/or make it trivial to install nonguix (ideally it should also be renamed to look like a real distribution, host documentation, etc, to be almost like Guix but reuses Guix in a sustainable way and have a different name to avoid confusion). We also have a similar rule currently for software that is not maintained, and here we can add it to guix-past and here too the cost look small as guix-past is also FSDG compliant. > Nevertheless, code claimed to be produced in whole or in part by > genAI **may be incorporated in the limit of at most 15 lines of > code** to ensure the contributor has a valid copyright claim on the > code. I think the problem is something else: how to make sure that 15 lines are not derived work from work that is incompatible with the GPLv3 or later. Beside that: - With the current rules, nothing prevent contributors for including code or data where copyright doesn't apply because that is compatible with the GPLv3. Requiring everything to be GPLv3 would complicate things a lot. - As far as I know, 15 lines isn't written in any laws. 1 line or less could be copyrightable. More than 15 lines can be not copyrightable (for instance in the case of a sort algorithm that is implemented in the canonical way). Here it would force any person who review patches to became expert in laws of many jurisdictions. I don't think we realistically have resources for that. So if we need a GCD specially for LLMs, I think we need it as a clarification of what already does exists, and potentially add small additional rules that closes the gaps we have. Denis.
pgpVN3xPPxg9T.pgp
Description: OpenPGP digital signature
