Brian Deacon wrote:
I think I probably don't completely understand the requirements, but it
sounds like we're all too fascinated with handling locking requirements
on a service that doesn't support it well.
Yep. You could use the Amazon service that *does* support distributed
locking, but that's one more thing to sign up for.
If there's anything I've discovered about doing this sort of work, each
external requirement you impose cuts the interested audience by an order
of magnitude.
Would the problem get a lot
simpler if you handled the locking locally, so that each node knew that
it was the winner for the particular task it was taking on?
Sure. That would be great. Suggest a mechanism. ;-) Really, how would
you handle local locking on a bunch of independent machines?
I think trying to get S3 to do something it's not good at is maybe not
such a good idea, and being a fairly new service with a lot of traction,
it is likely to better support these scenarios in the future without you
having to build your own mini infrastructure. Infrastructures are a
dangerous black hole because they're so much fun to write.
The locking part of it is actually a fairly small part of the code. The
status reporting and GUI are each at least as big as the
negotiation/locking stuff.
Or perhaps the gang of machines does their local work, and then just forwards
on the actual S3 manipulation to one server, so that bit of it serializes.
Um, yes. That server would be Amazon S3. ;-) That's what I'm using it
for. Alternately, of course, if you want to poke holes in your firewall
to let things communicate, you could do that too. Again, another
order-of-magnitude sort of thing. As it stands, this runs behind NATted
firewalls with no incoming connections needed.
Alright... I just re-read the initial description. Sounds to me like this
would've better been handled with a local message queue. The jobs are dumped
to the queue and your gang of machines are consumers of the queue.
Smarter people than us have solved these problems already.
Yes. Amazon offers that as a service too. One more thing to sign up for.
Amazon's queue stuff has its own problems, too. You'd still need status
reporting and such. The queue locks get automatically broken after some
user-defined interval, but unless you either set a really long timeout
or you set a pre-defined upper limit on how long a single job can run
without causing ugly failures (e.g., starting the same job a second
time), it is hard to get the locking right.
OK, so you lock a job, and the process dies, because the user forgot to
put in the job some of the files it needed. How do you unlock the job,
or otherwise stop waiting for it?
--
Darren New / San Diego, CA, USA (PST)
His kernel fu is strong.
He studied at the Shao Linux Temple.
--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-lpsg