Ah thanks Hongchao. Most systems I've seen would solve this problem by
using brokered messaging to ensure in order delivery of the state to both
server to do this, which of course has its own problems. It does feel like
this is using the k/v store in a manner that is somewhat non-traditional,
but interesting.

One nuance, I believe that etcd does not keep ALL changes but up to 1000
(on some index). So there is a limitation there.


C
On Apr 14, 2015 12:15 AM, "Hongchao Deng" <[email protected]> wrote:

> @Camille asked me how hot standby works in etcd. It's too much to answer
> over the weekend. I've been organizing information and try my best to
> answer now. It's better late than nothing.
>
> The goal is to make physically different identities behave logically the
> same. In order to do so, you need two things:
>
> 1. a lease of file to the leader. TTL in etcd, ephemeral nodes in ZK.
> 2. a history of what happened.
>
> Let's say an active master stores all states under "APP_STATE_STORE/".
> - It makes 3 changes: "app1=1, app2=2, app3=3".
> - It then crashes.
> - The lease will time out.
> What would a standby do? In etcd, the standby will first pick up all three
> changes, and then pick one more event -- leader node expired (lease
> timeout). It's made sure that standby will pick up all three changes before
> he finds leader is gone. Then standby will try to create the leader node.
>
> How to get all changes if a client reconnects? The trick is to watch from
> last index. It will find the relevant change in the "smallest larger"
> index. In etcd, it keeps the entire history (LSM tree) so nothing will be
> missed.
>
> etcd uses TTL to refresh the lease. It goes through the Paxos every TTL,
> which seems costly. I think (not very sure) Chubby uses a lease more like a
> custom TTL ephemeral node RPC.
> - It only goes through the leader.
> - Extending lease (touching session) is a RPC call with custom TTL, which
> isn't bound to TCP keep-alive. In this way, it's easier to expose API to
> clients of other languages.
> - The lease is a logical term, and it's not bound to a single node.
> - On failure case, it gives grace period.
> ​
> ​I think this is quite tricky problem. But I've seen needs coming up. Any
> thoughts?
>
> --
> *- Hongchao Deng*
> *Software Engineer*
>

Reply via email to