Hi all, We're using Zookeeper for Leader Election and system monitoring. We're also using it for synchronizing our cluster wide jobs with barriers. We're running into an issue where we now have a single job, but each node can fire the job independently of others with different criteria in the job. In the event of a system failure, another node in our application cluster will need to fire this Job. I've used quartz previously (we're running Java 6), but it simply isn't designed for the use case we have. I found this article on cloudera.
http://www.cloudera.com/blog/2008/11/job-scheduling-in-hadoop/ I've looked at both plugins, but they require hadoop. We're not currently running hadoop, we only have Cassandra. Here are the 2 basic use cases we need to support. UC1: Synchronized Jobs 1. A job is fired across all nodes 2. The nodes wait until the barrier is entered by all participants 3. The nodes process the data and leave 4. On all nodes leaving the barrier, the Leader node marks the job as complete. UC2: Multiple Jobs per Node 1. A Job is scheduled for a future time on a specific node (usually the same node that's creating the trigger) 2. A Trigger can be overwritten and cancelled without the job firing 3. In the event of a node failure, the Leader will take all pending jobs from the failed node, and partition them across the remaining nodes. Any input would be greatly appreciated. Thanks, Todd