Hi all,
  We're using Zookeeper for Leader Election and system monitoring.  We're
also using it for synchronizing our cluster wide jobs with  barriers.  We're
running into an issue where we now have a single job, but each node can fire
the job independently of others with different criteria in the job.  In the
event of a system failure, another node in our application cluster will need
to fire this Job.  I've used quartz previously (we're running Java 6), but
it simply isn't designed for the use case we have.  I found this article on
cloudera.

http://www.cloudera.com/blog/2008/11/job-scheduling-in-hadoop/


I've looked at both plugins, but they require hadoop.  We're not currently
running hadoop, we only have Cassandra.  Here are the 2 basic use cases we
need to support.

UC1: Synchronized Jobs
1. A job is fired across all nodes
2. The nodes wait until the barrier is entered by all participants
3. The nodes process the data and leave
4. On all nodes leaving the barrier, the Leader node marks the job as
complete.


UC2: Multiple Jobs per Node
1. A Job is scheduled for a future time on a specific node (usually the same
node that's creating the trigger)
2. A Trigger can be overwritten and cancelled without the job firing
3. In the event of a node failure, the Leader will take all pending jobs
from the failed node, and partition them across the remaining nodes.


Any input would be greatly appreciated.

Thanks,
Todd

Reply via email to