On Oct 19, 2008, at 11:34 PM, Antony Blakey wrote:

If you want to ensure that the username is unique at the time the user enters it, then you need a central synchronous service. Using the username/password as a pair isn't a good idea because it only takes two naive/lazy users to use a similar password (based say on their username? :) for collision to subsequently occur.

I've been considering this in a production environment, and I saw four solutions:

1. Append some form of unique id from the server you are currently talking to i.e. checksum then machine uuid + process. Any checksum is going to have some chance of global collision, but it could be made vanishingly small. Not great for the user because they have a complicated username.

2. Define your user interaction such that it can deal with subsequently needing to add some suffix to the username e.g. when you get a replication conflict (which could involve an number of conflicts equal to the number of writable replicas), you amend some/ all of the names to include a serial number and then email the user. This complicates things for the user, and they end up with a username they haven't chosen, or they may not see the email and end up abusing tech support etc etc.

3. Use couchdb in a single-writer multiple-reader scenario. If you only do that for those activities that require uniqueness then you have consistency issues to deal with because replication is asynchronous. One way to do that is to switch a session to the writable server as soon as you need uniqueness. The single writer becomes a bottleneck, but this is what I'm doing because it matches my information architecture.

4. Use a central specialized server to check uniqueness and generate an opaque userid token that you would subsequently use as a key (you shouldn't use the username as a key). An ldap server or something like it. Equivalent to the option above, but the single server only needs to deal with the particular operations requiring uniqueness. It's still a single point of failure, but I don't think you can get around that if you want synchronous global uniqueness testing.



a validating and complete treatment - thanks.

so pretty much my thoughts exactly. the further advantage that #3 has is that is means *nothing* has to be done up front, it's only when the app scales out the multiple dbs that work needs to be done but, at that time, it's presumably justified. i've also considered #4 heavily (a possible great web service actually...) probably will go will some sort of hybrid. that is to say always get them from the single db, but in a way that could require zero code changes when getting an id meant hitting some central service

  Db.next_id_for('user')

for instance. that way i can run with single writer, and seamlessly move to #4 later. i don't even think that would need to be a single point of failure as having a couple of those machines would be trivial if the knew about each other and generated ideas with small amount of server based uniqueness in them, for instance

  42a
  42b
  42c

all coming from behind a triple set of id generators. in the end it does seem like only #3 and 4 are appropriate for systems used by people.

cheers.


a @ http://codeforpeople.com/
--
we can deny everything, except that we have the possibility of being better. simply reflect on that.
h.h. the 14th dalai lama



Reply via email to