> I am having difficulty implementing "no parallel execution" guarantee --
> if worker (or connection to it) goes down I need to recognize this in
> Coordinator, "pause" all jobs given worker was running and (after some
> timeout or user action) re submit jobs to another worker. Timeout (or user
> action) is required to allow worker (if it is alive) to detect network
> error and stop it's jobs and start the cycle again (try to register self
> with Coordinator, etc). It is important that once connection was deemed as
> broken -- it never reused(or worker may not notice the problem), worker is
> treated as dead until it re-registers itself (after a job purge or
> restart).

gRPC doesn't have these sort of intrinsics.

The interesting part here smells like a variation on distributed locking.
You may want to look at something like ZooKeeper.

You could use gRPC messages to do things like communicate the lock names.

-- 
Christopher Warrington

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To post to this group, send email to grpc-io@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/854b55a7-dc40-4bd3-9fae-34351b9566f7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to