Darren New wrote:
> Stewart Stremler wrote:
>> It's sounding like it's less and less a good general-purpose approach
>> and more and more a specific-problem approach.
> 
> Here's a specific example. I have a program that communicates over the
> Amazon S3 remote distributed file system. The user sets up a bunch of
> named jobs, then launches off the same process on a dozen different
> machines, and lets them all take on jobs as they become available.
> 
> The S3 semantics are that reads and writes are atomic, but there's no
> locking or test-and-set mechanism. You can delete a file someone else
> just deleted without any errors (much to my initial surprise, as that
> was going to be my test-and-set). You can't create a file that someone
> else can't write over an instant later. But the contents you read are
> the complete contents that were written, and what you write gets updated
> atomically at some point after you write it. So here's (approximately)
> what I did, described from the point of view of the machine at 10.0.0.5
> IP address.
> 
> 1) Negotiate who is the master:
> 
> 1A) If "Elected" exists, copy its contents to the file named "10.0.0.5"
> and go to phase 3 if it matches my IP address, phase 2 if it doesn't.
> 
> 1B) If "Nominated" exists, copy its contents to the file named
> "10.0.0.5" and go to 1D.
> 
> 1C) Write the string "10.0.0.5" into "10.0.0.5" and into "Nominated".
> (Here, we don't think either of those files exist, so we think we're the
> first to come online. Yes, this is a race condition. See 1F below.)
> 
> 1D) Read "Nominated". If it does not contain your own IP address and it
> does not contain what you most recently wrote into the "10.0.0.5" file,
> write the contents of Nominated into your own "10.0.0.5" file and go
> back to 1A.
> 
> 1E) If "Nominated" contained your own IP address, read every other file
> in the directory besides "Nominated" and "Elected", and see if they all
> match the contents of "Nominated". If not, go back to 1A.
> 
> 1F) Here, "Nominated" agrees with every machine's concept of what's in
> Nominated. I.e., every machine has read the same value out of
> "Nominated", and everyone read the same value, and it's ME! So I'm
> elected. Write "10.0.0.5" into Elected. Go to phase 3.
> 
> 2) The wait-for-work steps:
> 
> 2A) I'm not elected as master, so write the file "READY-10.0.0.5".
> 
> 2B) Wait for "ASSIGNED-10.0.0.5" to show up, then delete "READY-10.0.0.5".
> 
> 2C) Read "Assigned." If it says "exit", delete my original 10.0.0.5 file
> from phase 1 and exit.
> 
> 2D) If it has the name of a job, read and run the job, writing ongoing
> status to "STATUS-10.0.0.5". When you finish, recreate "READY-10.0.0.5".
> 
> 3) The assign-work steps:
> 
> 3A) I'm master. Scan the directory looking for something that begins
> with "READY". If there's a matching "STATUS", read the status file and
> store the results, then delete the STATUS file.
> 
> 3B) Look for a "READY" without a "STATUS". When found, pick a job,
> assign it by writing it into "ASSIGNED" with the same extension. If none
> are left, write "EXIT" there.
> 
> 3C) If there are no "READY" files and no jobs left, exit.
> 
> 3D) Well, you get the idea.
> 
> Note that other than the initial "Nominated" file, nobody ever writes to
> a file at the same time as anyone else. Indeed, I don't think there's
> any file which two processes ever write to. Also, the protocol doesn't
> depend on the propagation of information to be synchronous - it's OK if
> I overwrite a file and you read the old version a few times before you
> see the change.
> 
> Obviously, test-and-set is easier when you're talking about local memory
> and stuff. But there's no test-and-set over (say) NFS, and hence locking
> is messy there.
> 

I'm sure it sounds more complicated than it is.. but it just /feels/
fragile!

Since you're using AWS, have you considered using SQS -- it might be
easier and maybe even more versatile.

There's a nice example documented at
http://developer.amazonwebservices.com/connect/entry.jspa?externalID=691&categoryID=102

Regards,
..jim

-- 
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-lpsg

Reply via email to