Darren New wrote: > Stewart Stremler wrote: >> It's sounding like it's less and less a good general-purpose approach >> and more and more a specific-problem approach. > > Here's a specific example. I have a program that communicates over the > Amazon S3 remote distributed file system. The user sets up a bunch of > named jobs, then launches off the same process on a dozen different > machines, and lets them all take on jobs as they become available. > > The S3 semantics are that reads and writes are atomic, but there's no > locking or test-and-set mechanism. You can delete a file someone else > just deleted without any errors (much to my initial surprise, as that > was going to be my test-and-set). You can't create a file that someone > else can't write over an instant later. But the contents you read are > the complete contents that were written, and what you write gets updated > atomically at some point after you write it. So here's (approximately) > what I did, described from the point of view of the machine at 10.0.0.5 > IP address. > > 1) Negotiate who is the master: > > 1A) If "Elected" exists, copy its contents to the file named "10.0.0.5" > and go to phase 3 if it matches my IP address, phase 2 if it doesn't. > > 1B) If "Nominated" exists, copy its contents to the file named > "10.0.0.5" and go to 1D. > > 1C) Write the string "10.0.0.5" into "10.0.0.5" and into "Nominated". > (Here, we don't think either of those files exist, so we think we're the > first to come online. Yes, this is a race condition. See 1F below.) > > 1D) Read "Nominated". If it does not contain your own IP address and it > does not contain what you most recently wrote into the "10.0.0.5" file, > write the contents of Nominated into your own "10.0.0.5" file and go > back to 1A. > > 1E) If "Nominated" contained your own IP address, read every other file > in the directory besides "Nominated" and "Elected", and see if they all > match the contents of "Nominated". If not, go back to 1A. > > 1F) Here, "Nominated" agrees with every machine's concept of what's in > Nominated. I.e., every machine has read the same value out of > "Nominated", and everyone read the same value, and it's ME! So I'm > elected. Write "10.0.0.5" into Elected. Go to phase 3. > > 2) The wait-for-work steps: > > 2A) I'm not elected as master, so write the file "READY-10.0.0.5". > > 2B) Wait for "ASSIGNED-10.0.0.5" to show up, then delete "READY-10.0.0.5". > > 2C) Read "Assigned." If it says "exit", delete my original 10.0.0.5 file > from phase 1 and exit. > > 2D) If it has the name of a job, read and run the job, writing ongoing > status to "STATUS-10.0.0.5". When you finish, recreate "READY-10.0.0.5". > > 3) The assign-work steps: > > 3A) I'm master. Scan the directory looking for something that begins > with "READY". If there's a matching "STATUS", read the status file and > store the results, then delete the STATUS file. > > 3B) Look for a "READY" without a "STATUS". When found, pick a job, > assign it by writing it into "ASSIGNED" with the same extension. If none > are left, write "EXIT" there. > > 3C) If there are no "READY" files and no jobs left, exit. > > 3D) Well, you get the idea. > > Note that other than the initial "Nominated" file, nobody ever writes to > a file at the same time as anyone else. Indeed, I don't think there's > any file which two processes ever write to. Also, the protocol doesn't > depend on the propagation of information to be synchronous - it's OK if > I overwrite a file and you read the old version a few times before you > see the change. > > Obviously, test-and-set is easier when you're talking about local memory > and stuff. But there's no test-and-set over (say) NFS, and hence locking > is messy there. >
I'm sure it sounds more complicated than it is.. but it just /feels/ fragile! Since you're using AWS, have you considered using SQS -- it might be easier and maybe even more versatile. There's a nice example documented at http://developer.amazonwebservices.com/connect/entry.jspa?externalID=691&categoryID=102 Regards, ..jim -- [email protected] http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-lpsg
