Simon Richter <[email protected]> writes: > But: we certainly cannot say the same for everyone who wants to > contribute, so we need either
> 1. a stronger vetting process before giving people upload permission, or > 2. a review process by which any contribution must be approved by other > developers. I think part of this discussion, and the point that I think people are going to need some convincing on, is whether the third and existing option will work: Treating LLM-based errors the same as any other errors for any other reason, tackling them reactively, and taking some sort of protective action if we get too many errors from the same contributor. I understand the concern that this won't be adequate because the potential volume of contributions using LLMs is so much higher, but do we have evidence yet that it won't be adequate? What evidence do we have that we have to handle LLM-based errors differently than other errors? I'm not saying there is no evidence, or that there are no good arguments here, only that I think this is the place where those arguments need to be made. If LLM use in contributions is a quality problem that is sufficiently large that it is going to cause disruptive problems unless we take some sort of action, that's a good argument for taking some sort of action, but also presumably it would be throwing off early warning signs that we can point to. Both 1 and 2 are pretty intrusive and obnoxious changes to our workflows. There's going to be a lot of reluctance about making such a change unless we have to. > "I can use this technology in a responsible way, so it needs to be > allowed" is not a policy, and neither are "you can't stop me anyway!" or > "any attempt to create policy will lead to [absurd outcome]." I agree with all of this, for the record. Somewhat by definition, if LLMs are causing significant problems, we will be able to see that and take action against the contributors responsible for causing those problems. This is clearly a place where we can create a policy and enforce it if we're seeing problems from LLMs. The missing piece for me is *are* we seeing problems from LLMs? I realize that's your point about wanting to be proactive rather than reactive: Even if we're not seeing problems now, our tools are bad enough for handling problems that we should act proactively now. I'm not sure that I'm convinced by this. Maybe what we need instead is to be proactive about having better tools to handle bad uploads? > I am fairly sure not even the proponents of AI use would be happy with a > policy of "AI use is allowed, but if you make a bad upload as a result > of not doing a thorough review, we nuke your upload permissions and make > you go through a tasks&skills check again", but I see very few options > here. I don't know, what's wrong with that? I'm not sure a single bad upload is the right threshold, but that seems reasonable? For me, it's a percentage thing: If you've made thousands of uploads and you screw up one, everyone is human, it happens. If you've made two uploads and your second one is a major mistake, half of your uploads are major mistakes and maybe someone needs to look over what you're doing. And yes, that does mean that someone who has a solid track record in Debian but then goes through some change that causes their work to significantly degrade (which could be all sorts of things from sleep deprivation to other health issues, not just the impact of new tools) is going to have a fair bit of leeway to screw up before someone does something. I think that's basically correct and this is what unstable is for, but we probably should have some threshold for intervention that's a bit more proactive than we have right now. -- Russ Allbery ([email protected]) <https://www.eyrie.org/~eagle/>

