@Camille asked me how hot standby works in etcd. It's too much to answer
over the weekend. I've been organizing information and try my best to
answer now. It's better late than nothing.

The goal is to make physically different identities behave logically the
same. In order to do so, you need two things:

1. a lease of file to the leader. TTL in etcd, ephemeral nodes in ZK.
2. a history of what happened.

Let's say an active master stores all states under "APP_STATE_STORE/".
- It makes 3 changes: "app1=1, app2=2, app3=3".
- It then crashes.
- The lease will time out.
What would a standby do? In etcd, the standby will first pick up all three
changes, and then pick one more event -- leader node expired (lease
timeout). It's made sure that standby will pick up all three changes before
he finds leader is gone. Then standby will try to create the leader node.

How to get all changes if a client reconnects? The trick is to watch from
last index. It will find the relevant change in the "smallest larger"
index. In etcd, it keeps the entire history (LSM tree) so nothing will be
missed.

etcd uses TTL to refresh the lease. It goes through the Paxos every TTL,
which seems costly. I think (not very sure) Chubby uses a lease more like a
custom TTL ephemeral node RPC.
- It only goes through the leader.
- Extending lease (touching session) is a RPC call with custom TTL, which
isn't bound to TCP keep-alive. In this way, it's easier to expose API to
clients of other languages.
- The lease is a logical term, and it's not bound to a single node.
- On failure case, it gives grace period.
​
​I think this is quite tricky problem. But I've seen needs coming up. Any
thoughts?

-- 
*- Hongchao Deng*
*Software Engineer*

Reply via email to