Recovery algorithm:

1. Set Master state to recovering, respond to any client requests/keepalives
with "master_recovering" status (clients move into a recovery state where
they don't send new requests but continue sending keepalives)
2. Read in session data  persisted in BDB to memory and recreate session map
3. Identify in progress operations, add them to the "completion map" and en
queue in the worker queue.
4. Create another thread to complete the session expiration for any sessions
marked for expiry
5. Set Master state to ready and  resume normal operations

In order to ensure correct completion of requests interrupted by the master
crash and ensure the client can safely retry these operations we need
request ids (to uniquely identify requests) and a completion map.

Request ids:
Clients maintain an increasing 64-bit request id and a min heap of
outstanding request ids. Before sending a request to the server, the client
inserts the new id
and deletes it after it receives a server response. On each keepalive
request the client sends the top of the heap (or 0 if the heap is empty)
which its lowest
in progress request. (Requires a mutex lock on insert and delete from heap)

The server maintains a list of in-progress request ids per session (in BDB)
as well as the most recently purged request id. New ids are added as part
of processing the request. The result of the processing the request is also
persisted (the state of the processing might also be stored if needed).
When the server receives a keepalive request it checks the client reported
in-progress request id. If this value is non-zero and different from the
most
recently purged request id (stored at the server) then the server updates to
the new id and deletes info on all previously store requests with smaller
ids.
(Shouldn't require BDB ops/locks in the common case, ie no in-progress
requests)

Completion map:
During master recovery, all incomplete requests will be enqueued and resumed
from where they were left off. In addition an entry in a completion map
containing  "session id+request id" --> "completion object" will be created.
Clients will attempt to retry these requests (with a retry flag). When the
master sees the retry flag it will check the completion map and wait for the
completion object to signal that the request processing is complete. The
master then looks up the request result (stored in BDB and possibly in the
completion object) and sends it back to the client.


-Sanjit

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/hypertable-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to