Is it possible to only persist modifying requests? and not the majority of read/keepalive requests?
On Thu, Sep 3, 2009 at 4:10 PM, Sanjit Jhala<[email protected]> wrote: > This is required to ensure that the state remains consistent whether or not > the client retries. On a retry the client provides the master with the > initial request id, allowing the master to synchronize the response with the > completion of the initial request. > > There could be some requests (eg close() ) which can cause multiple BDB > transactions (release lock, grant pending lock requests, delete if > ephemeral, delete handle) along with having to persist notifications (to > ensure they are not lost in a crash). A crash in the middle of these > operations leaves the system in an inconsistent state. The idea is to resume > incomplete operations while simultaneously handling client retries. > > -Sanjit > > On Thu, Sep 3, 2009 at 3:35 PM, Luke <[email protected]> wrote: >> >> Why do master needs to persist a list of requests if the client can >> already retry? I think we should store as little as possible in BDB, >> as it's the bottle neck. >> >> On Thu, Sep 3, 2009 at 1:39 PM, Sanjit Jhala<[email protected]> wrote: >> > Recovery algorithm: >> > >> > 1. Set Master state to recovering, respond to any client >> > requests/keepalives >> > with "master_recovering" status (clients move into a recovery state >> > where >> > they don't send new requests but continue sending keepalives) >> > 2. Read in session data persisted in BDB to memory and recreate session >> > map >> > 3. Identify in progress operations, add them to the "completion map" and >> > en >> > queue in the worker queue. >> > 4. Create another thread to complete the session expiration for any >> > sessions >> > marked for expiry >> > 5. Set Master state to ready and resume normal operations >> > >> > In order to ensure correct completion of requests interrupted by the >> > master >> > crash and ensure the client can safely retry these operations we need >> > request ids (to uniquely identify requests) and a completion map. >> > >> > Request ids: >> > Clients maintain an increasing 64-bit request id and a min heap of >> > outstanding request ids. Before sending a request to the server, the >> > client >> > inserts the new id >> > and deletes it after it receives a server response. On each keepalive >> > request the client sends the top of the heap (or 0 if the heap is empty) >> > which its lowest >> > in progress request. (Requires a mutex lock on insert and delete from >> > heap) >> > >> > The server maintains a list of in-progress request ids per session (in >> > BDB) >> > as well as the most recently purged request id. New ids are added as >> > part >> > of processing the request. The result of the processing the request is >> > also >> > persisted (the state of the processing might also be stored if needed). >> > When the server receives a keepalive request it checks the client >> > reported >> > in-progress request id. If this value is non-zero and different from the >> > most >> > recently purged request id (stored at the server) then the server >> > updates to >> > the new id and deletes info on all previously store requests with >> > smaller >> > ids. >> > (Shouldn't require BDB ops/locks in the common case, ie no in-progress >> > requests) >> > >> > Completion map: >> > During master recovery, all incomplete requests will be enqueued and >> > resumed >> > from where they were left off. In addition an entry in a completion map >> > containing "session id+request id" --> "completion object" will be >> > created. >> > Clients will attempt to retry these requests (with a retry flag). When >> > the >> > master sees the retry flag it will check the completion map and wait for >> > the >> > completion object to signal that the request processing is complete. The >> > master then looks up the request result (stored in BDB and possibly in >> > the >> > completion object) and sends it back to the client. >> > >> > >> > -Sanjit >> > >> > > >> > >> >> > > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Hypertable Development" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en -~----------~----~----~----~------~----~------~--~---
