Hi Dan,
thanks for bringing this up.
Now, I want to ask whether anyone sees any issue with merging
AI-generated program code.
I think there are all sorts of issues, and I won't go into the broader
issues regarding the copyright status of AI-generated code (which might
contain copyrighted bits of training material), the environmental impact
of AI data centers etc. Instead, I'll focus on the question of quality
of code contributions, but these of course automatically pertain to how
that quality gets assessed.
This is somewhat urgent in the sense that we have such an MR in draft,
though of course we could delay it until there is certainty. I just
wouldn't want the uncertainty to last so long that a capable
contributor gets frustrated and leaves.
To my knowledge (and I have been paying only minimal attention), the
FSF views AI-assisted contributions to GNU projects as potentially
problematic but has not established a policy.
As a reviewer, I strongly desire two things:
1. openness about the origin of the code I'm reviewing
2. accountability of the human submitter (not reviewers)
for the code that is merged
For the MR that is in draft now, there were tells in the patch, but I
had to ask the submitter twice before he confirmed that it was
"AI-assisted."
In my opinion, LilyPond should only contain code that a human
understands. It is essential that the creator of a MR understands their
code, and it is desirable that a code reviewer understands the code as
well (the latter, of course, depending very much on the time and
generosity of people willing to do reviews). This is not just a question
about AI contributions: It means that LilyPond also shouldn't contain
human-written code of the "I added that line and then the problem
somehow went away, knock on wood" type. It's of course hard to enforce
this, but a thorough review where questions can be raised and must be
dealt with makes it more probable.
To streamline this in the future, I propose configuring a template for
default MR descriptions something like this:
##### Description
<!-- Describe your motivation and your work briefly
to orient reviewers. If you have not described
your commits well, go back and do that first. -->
##### Question
What percentage of this work is AI-generated? <!-- 0-100 -->
Do you think that would effectively address that specific concern?
Of course, since the number given will (according to your proposal)
influence how the MR is dealt with, we depend on getting an honest
answer to that question. I don't want to seem paranoid, but maybe it
would be wise to add - somewhere in the CG - a statement along the lines
of: Commits with non-disclosed AI-generated code get refused (or may get
reverted later).
In the discussions of the current MR I noticed the term "AI-assisted".
Maybe it would we a good idea to distinguish various kinds and degrees
of AI assistance: IIUC, not every way of using an LLM during development
leads to longer, coherent blocks of AI generated code.
Therefore, I suggest adopting a new policy: AI-generated program code
does not automatically move forward without a human reviewer's
acknowledgment. It should be full acknowledgment, not, for example,
"C++ LGTM; don't know about Scheme."
It would fall to the "patch meister" to help people follow this policy
and to allow sensible exceptions, such as if a contributor with a good
record vouches for the quality of his own AI-generated submission in
an area where he has developed expertise.
I support this. This basically means that you either have to motivate
reviewers to look at your code, or you have to build a reputation by
smaller patches that both show and increase your familiarity with the
codebase.
Lukas