On Oct 19, 2008, at 11:34 PM, Antony Blakey wrote:
If you want to ensure that the username is unique at the time the
user enters it, then you need a central synchronous service. Using
the username/password as a pair isn't a good idea because it only
takes two naive/lazy users to use a similar password (based say on
their username? :) for collision to subsequently occur.
I've been considering this in a production environment, and I saw
four solutions:
1. Append some form of unique id from the server you are currently
talking to i.e. checksum then machine uuid + process. Any checksum
is going to have some chance of global collision, but it could be
made vanishingly small. Not great for the user because they have a
complicated username.
2. Define your user interaction such that it can deal with
subsequently needing to add some suffix to the username e.g. when
you get a replication conflict (which could involve an number of
conflicts equal to the number of writable replicas), you amend some/
all of the names to include a serial number and then email the user.
This complicates things for the user, and they end up with a
username they haven't chosen, or they may not see the email and end
up abusing tech support etc etc.
3. Use couchdb in a single-writer multiple-reader scenario. If you
only do that for those activities that require uniqueness then you
have consistency issues to deal with because replication is
asynchronous. One way to do that is to switch a session to the
writable server as soon as you need uniqueness. The single writer
becomes a bottleneck, but this is what I'm doing because it matches
my information architecture.
4. Use a central specialized server to check uniqueness and generate
an opaque userid token that you would subsequently use as a key (you
shouldn't use the username as a key). An ldap server or something
like it. Equivalent to the option above, but the single server only
needs to deal with the particular operations requiring uniqueness.
It's still a single point of failure, but I don't think you can get
around that if you want synchronous global uniqueness testing.
a validating and complete treatment - thanks.
so pretty much my thoughts exactly. the further advantage that #3 has
is that is means *nothing* has to be done up front, it's only when the
app scales out the multiple dbs that work needs to be done but, at
that time, it's presumably justified. i've also considered #4 heavily
(a possible great web service actually...) probably will go will some
sort of hybrid. that is to say always get them from the single db,
but in a way that could require zero code changes when getting an id
meant hitting some central service
Db.next_id_for('user')
for instance. that way i can run with single writer, and seamlessly
move to #4 later. i don't even think that would need to be a single
point of failure as having a couple of those machines would be trivial
if the knew about each other and generated ideas with small amount of
server based uniqueness in them, for instance
42a
42b
42c
all coming from behind a triple set of id generators. in the end it
does seem like only #3 and 4 are appropriate for systems used by people.
cheers.
a @ http://codeforpeople.com/
--
we can deny everything, except that we have the possibility of being
better. simply reflect on that.
h.h. the 14th dalai lama