Stewart Stremler wrote:
It's sounding like it's less and less a good general-purpose approach
and more and more a specific-problem approach.

Here's a specific example. I have a program that communicates over the Amazon S3 remote distributed file system. The user sets up a bunch of named jobs, then launches off the same process on a dozen different machines, and lets them all take on jobs as they become available.

The S3 semantics are that reads and writes are atomic, but there's no locking or test-and-set mechanism. You can delete a file someone else just deleted without any errors (much to my initial surprise, as that was going to be my test-and-set). You can't create a file that someone else can't write over an instant later. But the contents you read are the complete contents that were written, and what you write gets updated atomically at some point after you write it. So here's (approximately) what I did, described from the point of view of the machine at 10.0.0.5 IP address.

1) Negotiate who is the master:

1A) If "Elected" exists, copy its contents to the file named "10.0.0.5" and go to phase 3 if it matches my IP address, phase 2 if it doesn't.

1B) If "Nominated" exists, copy its contents to the file named "10.0.0.5" and go to 1D.

1C) Write the string "10.0.0.5" into "10.0.0.5" and into "Nominated". (Here, we don't think either of those files exist, so we think we're the first to come online. Yes, this is a race condition. See 1F below.)

1D) Read "Nominated". If it does not contain your own IP address and it does not contain what you most recently wrote into the "10.0.0.5" file, write the contents of Nominated into your own "10.0.0.5" file and go back to 1A.

1E) If "Nominated" contained your own IP address, read every other file in the directory besides "Nominated" and "Elected", and see if they all match the contents of "Nominated". If not, go back to 1A.

1F) Here, "Nominated" agrees with every machine's concept of what's in Nominated. I.e., every machine has read the same value out of "Nominated", and everyone read the same value, and it's ME! So I'm elected. Write "10.0.0.5" into Elected. Go to phase 3.

2) The wait-for-work steps:

2A) I'm not elected as master, so write the file "READY-10.0.0.5".

2B) Wait for "ASSIGNED-10.0.0.5" to show up, then delete "READY-10.0.0.5".

2C) Read "Assigned." If it says "exit", delete my original 10.0.0.5 file from phase 1 and exit.

2D) If it has the name of a job, read and run the job, writing ongoing status to "STATUS-10.0.0.5". When you finish, recreate "READY-10.0.0.5".

3) The assign-work steps:

3A) I'm master. Scan the directory looking for something that begins with "READY". If there's a matching "STATUS", read the status file and store the results, then delete the STATUS file.

3B) Look for a "READY" without a "STATUS". When found, pick a job, assign it by writing it into "ASSIGNED" with the same extension. If none are left, write "EXIT" there.

3C) If there are no "READY" files and no jobs left, exit.

3D) Well, you get the idea.

Note that other than the initial "Nominated" file, nobody ever writes to a file at the same time as anyone else. Indeed, I don't think there's any file which two processes ever write to. Also, the protocol doesn't depend on the propagation of information to be synchronous - it's OK if I overwrite a file and you read the old version a few times before you see the change.

Obviously, test-and-set is easier when you're talking about local memory and stuff. But there's no test-and-set over (say) NFS, and hence locking is messy there.

--
  Darren New / San Diego, CA, USA (PST)
    His kernel fu is strong.
    He studied at the Shao Linux Temple.

--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-lpsg

Reply via email to