> I am having difficulty implementing "no parallel execution" guarantee --
> if worker (or connection to it) goes down I need to recognize this in
> Coordinator, "pause" all jobs given worker was running and (after some
> timeout or user action) re submit jobs to another worker. Timeout (or user
> action) is required to allow worker (if it is alive) to detect network
> error and stop it's jobs and start the cycle again (try to register self
> with Coordinator, etc). It is important that once connection was deemed as
> broken -- it never reused(or worker may not notice the problem), worker is
> treated as dead until it re-registers itself (after a job purge or
gRPC doesn't have these sort of intrinsics.
The interesting part here smells like a variation on distributed locking.
You may want to look at something like ZooKeeper.
You could use gRPC messages to do things like communicate the lock names.
You received this message because you are subscribed to the Google Groups
To unsubscribe from this group and stop receiving emails from it, send an email
To post to this group, send email to firstname.lastname@example.org.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit
For more options, visit https://groups.google.com/d/optout.