Re: RFC: simple proposal for Internet-scoped IDs

Eric S. Raymond Tue, 04 Dec 2012 00:39:56 -0800

Peter Samuelson <pe...@p12n.org>:
> 
> [Eric S. Raymond]
> > 1. Add support to the client tools for shipping a FULLNAME field
> > mined from somewhere under ~/.subversion.  Maybe the existing 
> > username entry will do, maybe it won't - I see arguments both ways.
> > I don't care, we can fill in that detail later.
> 
> This part (upon which your whole proposal hinges) makes me scratch my
> head a bit.  Why should the client be involved at all?


Because other ways of setting this bit of metadata have serious though
un-obvious issues, and offer only partial coverage for particular special
cases.  This one mechanism would addresses *all* the deployment cases, and
would do so in a way that properly empowers users, minimizes work for
administrators, and avoids scaling problems.

I will address each of these points. But first I want to clearly
lay out the desirable qualities and consequences of fullname/email
attribution cookies; they bear on the use cases I'm going to describe.

1. They have enough entropy that collisions aren't a practical problem.
A human name alone does not.  I'm excluding deliberate spoofing from
the analysis because we now have enough experience with un-cryptosigned
commits in DVCSes to say that this effectively never happens.

2. Because of (1), they allow repositories to be mobile as a
disaster-recovery hedge.  I've already explained this in detail and
won't rehash it except to remind everyone that mobile != distributed -
I'm not trying to morph Subversion into a DVCS.

3. They imply an email pointer back to a responsible person for each commit.

4. They function as Internet-wide primary keys for reputation systems.
The type case I have in mind is Ohloh, which aggregates statistics
from multiple repositories. It uses your fullname/email pair as a
primary key to automatically identify you as a committer in multiple
projects, which in turn feeds their "kudos" reputation system.  High
kudos identifies you as a good person to collaborate with.  We'll
undoubtedly see more of this sort of thing.

Now, with these use cases in mind, let's consider different protocols 
that might allow users to set their attribution cookies.

A. Users *can't* set them. Instead, we generate them - say, by consing up a
generated email address on the host and the contents of the user's
GECOS field.

B. Site administrators can set them by editing something on which 
users don't normally have modification rights; an LDAP directory
will stand as an example.  This is one variant of your 
"account database" case.

C. Users can change them through site-specific interfaces such as
forge glue code, not needing to go through an administrator. This is
another variant of your "account database" case.

D. Users can set them through a preference in the Subversion client.

OK, now let's matrix the modification scenarios with the use
cases and see how they fare.

Case 1 is equally good under A, B, C, and D.  So is case 2.

It's case 3 where we start to run into trouble.  Suddenly A doesn't
look so good. If the host referenced in a constructed cookie goes
away, so almost certainly does any email pointer back to the person
with that hostname wired in.  This is the same disaster scenario for
which we want painless repository mobility.

If we want to preserve the case 3 property that attribution cookies are
reliable pointers to people, we need a method that lets users set the
email address component to something that can be expected to be valid
on a longer timescale, like a personal domain (mine, admittedly an
extreme case, has been valid since 1985).

In theory, protocols B or C or D will do.  But look at the difference:
B or C imply a whole lot more work and a whole lot more places for the
process to fail. Let's suppose that a user wants to point commits back
to himself at a stable mail provider across multiple repositories.
Now he has to (a) remember his backtrail, (b) navigate M different
site-local interfaces, and/or (c) petition N sets of system
administrators, none of whom will thank him for increasing their
workload.

Contrast protocol D: the user sets *one* preference in *one* place.
He's done, nobody else had to do any work, and the change is
guaranteed to be reflected in all his future commits.  No scaling
problem here.

Now consider case 4.  Sites like Ohloh increase the value of a
*single* email address in all your attributions that is not only
Internet-scoped but belongs to *you* and is lifetime-stable.

The addresses <e...@apache.org> and <e...@savannah.org> are both
Internet-scoped - but every DVCS I use knows I am <e...@thyrsus.com>
because I configured that *once*.  That means "Eric S. Raymond
<e...@thyrsus.com>" goes into every DVCS commit I make, and whenever
anyone registers in a new repo Ohloh has no trouble identifying my
commits.

This is parallel to the case 3 argument for D.  But it's a *different*
argument, showing that the value of protocol D is not a fluke
attached to just one identity use case.

So, in summary, every  choice other than letting people set their
own attribution cookie through the client pointlessly complicates
life - causes code duplication, works admins harder, and introduces
failure modes that make it more difficult to maintain a stable
public identity.

That having been said, I still think case 2 (repo mobility, for which
we don't strictly need user-settable attribution cookies) is the most
important win. But why buy all this other grief when we can easily
empower users to solve the *whole* problem?

> Now, of course, it's possible that some people would want a more
> structured way to express the _author_ as opposed to the _committer_.
> As someone said upthread, it's perhaps unfortunate that our svn:author
> property really means committer.  But it is what it is, and I don't
> think it's wise to change its semantics to be a more loosely defined
> "original author".  If needed, that should be a separate field.
> 
> And indeed, standardizing an "original author" field may not be without
> controversy.  It is a step along the road to also more policy-driven
> features such as the famous Signed-Off-By from git land.  At some point
> it stops being Subversion's responsibility to standardize this kind of
> metadata.  There may be some question whether that point is "original
> author" or "Q/A signoff" or something else.

I consider this (how to amend the attribution model) an orthogonal
issue.  It's a whole different discussion into which it would be best
not to get sidetracked.
-- 
                <a href="http://www.catb.org/~esr/";>Eric S. Raymond</a>

Re: RFC: simple proposal for Internet-scoped IDs

Reply via email to