Brian Deacon wrote:
I think I probably don't completely understand the requirements, but it
sounds like we're all too fascinated with handling locking requirements
on a service that doesn't support it well.

Yep. You could use the Amazon service that *does* support distributed locking, but that's one more thing to sign up for.

If there's anything I've discovered about doing this sort of work, each external requirement you impose cuts the interested audience by an order of magnitude.

Would the problem get a lot
simpler if you handled the locking locally, so that each node knew that
it was the winner for the particular task it was taking on?

Sure. That would be great. Suggest a mechanism. ;-) Really, how would you handle local locking on a bunch of independent machines?

I think trying to get S3 to do something it's not good at is maybe not
such a good idea, and being a fairly new service with a lot of traction,
it is likely to better support these scenarios in the future without you
having to build your own mini infrastructure.  Infrastructures are a
dangerous black hole because they're so much fun to write.

The locking part of it is actually a fairly small part of the code. The status reporting and GUI are each at least as big as the negotiation/locking stuff.

Or perhaps the gang of machines does their local work, and then just forwards
on the actual S3 manipulation to one server, so that bit of it serializes.

Um, yes. That server would be Amazon S3. ;-) That's what I'm using it for. Alternately, of course, if you want to poke holes in your firewall to let things communicate, you could do that too. Again, another order-of-magnitude sort of thing. As it stands, this runs behind NATted firewalls with no incoming connections needed.

Alright... I just re-read the initial description.  Sounds to me like this
would've better been handled with a local message queue.  The jobs are dumped
to the queue and your gang of machines are consumers of the queue. Smarter people than us have solved these problems already.

Yes. Amazon offers that as a service too. One more thing to sign up for.

Amazon's queue stuff has its own problems, too. You'd still need status reporting and such. The queue locks get automatically broken after some user-defined interval, but unless you either set a really long timeout or you set a pre-defined upper limit on how long a single job can run without causing ugly failures (e.g., starting the same job a second time), it is hard to get the locking right.

OK, so you lock a job, and the process dies, because the user forgot to put in the job some of the files it needed. How do you unlock the job, or otherwise stop waiting for it?

--
  Darren New / San Diego, CA, USA (PST)
    His kernel fu is strong.
    He studied at the Shao Linux Temple.

--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-lpsg

Reply via email to