Peter Samuelson <pe...@p12n.org>: > > [Eric S. Raymond] > > 1. Add support to the client tools for shipping a FULLNAME field > > mined from somewhere under ~/.subversion. Maybe the existing > > username entry will do, maybe it won't - I see arguments both ways. > > I don't care, we can fill in that detail later. > > This part (upon which your whole proposal hinges) makes me scratch my > head a bit. Why should the client be involved at all?
Because other ways of setting this bit of metadata have serious though un-obvious issues, and offer only partial coverage for particular special cases. This one mechanism would addresses *all* the deployment cases, and would do so in a way that properly empowers users, minimizes work for administrators, and avoids scaling problems. I will address each of these points. But first I want to clearly lay out the desirable qualities and consequences of fullname/email attribution cookies; they bear on the use cases I'm going to describe. 1. They have enough entropy that collisions aren't a practical problem. A human name alone does not. I'm excluding deliberate spoofing from the analysis because we now have enough experience with un-cryptosigned commits in DVCSes to say that this effectively never happens. 2. Because of (1), they allow repositories to be mobile as a disaster-recovery hedge. I've already explained this in detail and won't rehash it except to remind everyone that mobile != distributed - I'm not trying to morph Subversion into a DVCS. 3. They imply an email pointer back to a responsible person for each commit. 4. They function as Internet-wide primary keys for reputation systems. The type case I have in mind is Ohloh, which aggregates statistics from multiple repositories. It uses your fullname/email pair as a primary key to automatically identify you as a committer in multiple projects, which in turn feeds their "kudos" reputation system. High kudos identifies you as a good person to collaborate with. We'll undoubtedly see more of this sort of thing. Now, with these use cases in mind, let's consider different protocols that might allow users to set their attribution cookies. A. Users *can't* set them. Instead, we generate them - say, by consing up a generated email address on the host and the contents of the user's GECOS field. B. Site administrators can set them by editing something on which users don't normally have modification rights; an LDAP directory will stand as an example. This is one variant of your "account database" case. C. Users can change them through site-specific interfaces such as forge glue code, not needing to go through an administrator. This is another variant of your "account database" case. D. Users can set them through a preference in the Subversion client. OK, now let's matrix the modification scenarios with the use cases and see how they fare. Case 1 is equally good under A, B, C, and D. So is case 2. It's case 3 where we start to run into trouble. Suddenly A doesn't look so good. If the host referenced in a constructed cookie goes away, so almost certainly does any email pointer back to the person with that hostname wired in. This is the same disaster scenario for which we want painless repository mobility. If we want to preserve the case 3 property that attribution cookies are reliable pointers to people, we need a method that lets users set the email address component to something that can be expected to be valid on a longer timescale, like a personal domain (mine, admittedly an extreme case, has been valid since 1985). In theory, protocols B or C or D will do. But look at the difference: B or C imply a whole lot more work and a whole lot more places for the process to fail. Let's suppose that a user wants to point commits back to himself at a stable mail provider across multiple repositories. Now he has to (a) remember his backtrail, (b) navigate M different site-local interfaces, and/or (c) petition N sets of system administrators, none of whom will thank him for increasing their workload. Contrast protocol D: the user sets *one* preference in *one* place. He's done, nobody else had to do any work, and the change is guaranteed to be reflected in all his future commits. No scaling problem here. Now consider case 4. Sites like Ohloh increase the value of a *single* email address in all your attributions that is not only Internet-scoped but belongs to *you* and is lifetime-stable. The addresses <e...@apache.org> and <e...@savannah.org> are both Internet-scoped - but every DVCS I use knows I am <e...@thyrsus.com> because I configured that *once*. That means "Eric S. Raymond <e...@thyrsus.com>" goes into every DVCS commit I make, and whenever anyone registers in a new repo Ohloh has no trouble identifying my commits. This is parallel to the case 3 argument for D. But it's a *different* argument, showing that the value of protocol D is not a fluke attached to just one identity use case. So, in summary, every choice other than letting people set their own attribution cookie through the client pointlessly complicates life - causes code duplication, works admins harder, and introduces failure modes that make it more difficult to maintain a stable public identity. That having been said, I still think case 2 (repo mobility, for which we don't strictly need user-settable attribution cookies) is the most important win. But why buy all this other grief when we can easily empower users to solve the *whole* problem? > Now, of course, it's possible that some people would want a more > structured way to express the _author_ as opposed to the _committer_. > As someone said upthread, it's perhaps unfortunate that our svn:author > property really means committer. But it is what it is, and I don't > think it's wise to change its semantics to be a more loosely defined > "original author". If needed, that should be a separate field. > > And indeed, standardizing an "original author" field may not be without > controversy. It is a step along the road to also more policy-driven > features such as the famous Signed-Off-By from git land. At some point > it stops being Subversion's responsibility to standardize this kind of > metadata. There may be some question whether that point is "original > author" or "Q/A signoff" or something else. I consider this (how to amend the attribution model) an orthogonal issue. It's a whole different discussion into which it would be best not to get sidetracked. -- <a href="http://www.catb.org/~esr/">Eric S. Raymond</a>