Hey Ludo,
I have some comments from a quick read. I hope it's fine I am not waiting for the formal discussion period, as I see no reason to postpone commenting. I might be more busy later on, so the comments might not come at all, which would seem unfortunate to me. Ludovic Courtès <[email protected]> writes: > authors: Ludovic Courtès I understand we've established the header will be 'authors', not just 'author' even in case there is just one person. However, in the text there is 'authors' used multiple times and it was confusing to me, I wasn't sure if it was referring to GCD author or authors of something else. I propose to use 'an author' in the text instead (as long as there is no second author). > > ## Pledge > > We propose the following project commitments: > > 1. The project **will not use nor encourage use of genAI** for its > code, packages, code review, artwork, translations, or any other > artifacts. What does it mean 'the project'? There are maintainers, committers, team members behind this project... Is this supposed to say that committers will not use genAI for work on Guix on the tasks outlined here? Or only that the computing resources of the project won't use automated review checkers, auto updates of packages using LLM and so on? I think it would be nice if this was more clearly written, either saying what individuals are/aren't supposed to do in regards to Guix, and/or defining in what sense the project won't use them. In the end it's about respoinsibilities of those individuals, so claimiing the project won't do something makes it vague, I can think of many interpretations of this and I am not sure which one is the right one. The previous paragraphs apply mostly to the 'will not use genAI' point. The other point, 'encourage use of genAI' is clearer, I would say (but still not 100 %), as I can imagine that one meaning that officially, on the sites, in documentation of the project and so on there won't be text saying 'you might want to use LLM for X, Y and Z' and so on. So I think even this other point would be worth an expansion. > 2. We kindly ask contributors to respect this choice and not use LLMs > for their contributions to Guix. Nevertheless, code claimed to be > produced in whole or in part by genAI **may be incorporated in the > limit of at most 15 lines of code** to ensure the contributor has a > valid copyright claim on the code. What if it's 16? What if I fold few lines into one to make the limit? What if the contribution is split to multiple to comply? I have looked online and I understand some sources are following this heuristic of 15 lines of code being non-copyrightable, so I suppose it comes from such places? But there can be 15 lines of code that are just formatting, updating a version, toggling a simple parameter, like #:tests? and there can be 15 lines of code that changes how the whole Guix System boots. So wouldn't it be better to say something in the sense that if the code would not be copyrightable when written by human, it is fine to be done by genAI? Or are there some laws / legal precedents that operate on rule of 15 lines of code exactly, not taking into account the contents? And in such case, we can definitely leave this point as is. > 3. Software where the majority of commits were authored or co-authored > by genAI **will not be packaged in Guix**. Notable examples of > such code include [Claude’s C > compiler](https://github.com/anthropics/claudes-c-compiler/), > [EmDash](https://github.com/emdash-cms/emdash), and > [Neomacs](https://github.com/eval-exec/neomacs), I think there will be software that is currently not written by LLMs, but might gradually become so. The parts of the code being progressively replaced by LLM written code. So for such cases, maybe there should also be a clause regarding updating software? Ie. that Guix won't be updating even already packaged packages when they get substantial rewrites through LLM and possibly deprecate them, following the rules of the GCD on deprecation? Additionally, I think a better metric would be the code itself, not commits themselves. If you had a project with 5000 commits and then one commit rewriting most through LLM, it might still fall through as pronct that might be packaged in Guix. Of course you could argue it's not even the same project anymore, but that's more of a philosophical question and opinions might differ, so looking at parts of the code being written by LLM looks to me more resilient to such disputes. Lastly, I have a question about the motivation of this point. Is this meant as and extension of the fact that Guix packages only open source licensed software, or is it meant out of 'fear' that LLM-written code is not free & open source - taking into account the question of copyrightability of LLM code? If it's the former, okay. If it's the latter, I think this point could be generalized and made 'dependant' on how the legal framework will get established around LLMs in the future. Of course currently this is an unknown zone, so it probably has to be called out explicitly at least in some way, but it might be good to say the motivation here so that after the FOSS community and laws start working with world with LLM completely, it could be argued more easily how to change this point, if anyhow. > 4. Packages in Guix will always be **built from source**, the only > exceptions being compilers or build systems for which a bootstrap Just taking beginning of this point. My understanding is that this is already the case, this GCD is not changing anything. And in that case I think it would be nice if it was explicitly called out. So something in the sense: 'Guix has already committed to building everything from source, with the only exceptions... And in case of software containing fully trained neural networks that means ...' Of course then the section about neural networks is a new one and it's good to establish that. --- I understand my points might be making the pledge longer, which might be undesirable. And if that's so, I think it would be okay if we had more details in the GCD, later on. The pledge itself does not have to be non-ambiguous as long as it is explained elsewhere in the document. Rutherther
