It seems I will be making my own bicycle...

On Monday, April 2, 2018 at 4:40:28 PM UTC-5, CM wrote:
> Hi,
> I am designing a small distributed job scheduling system with a twist -- 
> each job can be re-executed (idempotent), but same job can't be executed by 
> two workers in parallel. This requirement makes everything really difficult 
> in presence of network/worker failures.
> In essence on a high level it looks like this:
> - Worker -- a process (one of many) that connects to Coordinator, receives 
> jobs, executes them and submits generated sub-jobs back (if any)
> - think "traversing a filesystem": "process this directory" job will 
> generate a bunch of sub-jobs (one for each directory item)
> - Coordinator -- maintains systems state, feeds jobs to workers
> - System State -- list of jobs and their current status (executing on 
> worker X, done, etc), can be just a list in memory or table in database
> - job can take a very long time
> I am having difficulty implementing "no parallel execution" guarantee -- 
> if worker (or connection to it) goes down I need to recognize this in 
> Coordinator, "pause" all jobs given worker was running and (after some 
> timeout or user action) re submit jobs to another worker. Timeout (or user 
> action) is required to allow worker (if it is alive) to detect network 
> error and stop it's jobs and start the cycle again (try to register self 
> with Coordinator, etc). It is important that once connection was deemed as 
> broken -- it never reused(or worker may not notice the problem), worker is 
> treated as dead until it re-registers itself (after a job purge or restart).
> Can grpc help me implement this? I am feeling like reinventing a 
> bicycle... This certainly can be done with raw TCP (with manual 
> keep-alives), but I'd like to avoid coding all that logic.
> Regards,
> Michael.

You received this message because you are subscribed to the Google Groups 
"" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
To post to this group, send email to
Visit this group at
To view this discussion on the web visit
For more options, visit

Reply via email to