Is it possible to only persist modifying requests? and not the
majority of read/keepalive requests?

On Thu, Sep 3, 2009 at 4:10 PM, Sanjit Jhala<[email protected]> wrote:
> This is required to ensure that the state remains consistent whether or not
> the client retries. On a retry the client provides the master with the
> initial request id, allowing the master to synchronize the response with the
> completion of the initial request.
>
> There could be some requests (eg close() ) which can cause multiple BDB
> transactions (release lock, grant pending lock requests, delete if
> ephemeral, delete handle) along with having to persist notifications (to
> ensure they are not lost in a crash). A crash in the middle of these
> operations leaves the system in an inconsistent state. The idea is to resume
> incomplete operations while simultaneously handling client retries.
>
> -Sanjit
>
> On Thu, Sep 3, 2009 at 3:35 PM, Luke <[email protected]> wrote:
>>
>> Why do master needs to persist a list of requests if the client can
>> already retry? I think we should store as little as possible in BDB,
>> as it's the bottle neck.
>>
>> On Thu, Sep 3, 2009 at 1:39 PM, Sanjit Jhala<[email protected]> wrote:
>> > Recovery algorithm:
>> >
>> > 1. Set Master state to recovering, respond to any client
>> > requests/keepalives
>> > with "master_recovering" status (clients move into a recovery state
>> > where
>> > they don't send new requests but continue sending keepalives)
>> > 2. Read in session data  persisted in BDB to memory and recreate session
>> > map
>> > 3. Identify in progress operations, add them to the "completion map" and
>> > en
>> > queue in the worker queue.
>> > 4. Create another thread to complete the session expiration for any
>> > sessions
>> > marked for expiry
>> > 5. Set Master state to ready and  resume normal operations
>> >
>> > In order to ensure correct completion of requests interrupted by the
>> > master
>> > crash and ensure the client can safely retry these operations we need
>> > request ids (to uniquely identify requests) and a completion map.
>> >
>> > Request ids:
>> > Clients maintain an increasing 64-bit request id and a min heap of
>> > outstanding request ids. Before sending a request to the server, the
>> > client
>> > inserts the new id
>> > and deletes it after it receives a server response. On each keepalive
>> > request the client sends the top of the heap (or 0 if the heap is empty)
>> > which its lowest
>> > in progress request. (Requires a mutex lock on insert and delete from
>> > heap)
>> >
>> > The server maintains a list of in-progress request ids per session (in
>> > BDB)
>> > as well as the most recently purged request id. New ids are added as
>> > part
>> > of processing the request. The result of the processing the request is
>> > also
>> > persisted (the state of the processing might also be stored if needed).
>> > When the server receives a keepalive request it checks the client
>> > reported
>> > in-progress request id. If this value is non-zero and different from the
>> > most
>> > recently purged request id (stored at the server) then the server
>> > updates to
>> > the new id and deletes info on all previously store requests with
>> > smaller
>> > ids.
>> > (Shouldn't require BDB ops/locks in the common case, ie no in-progress
>> > requests)
>> >
>> > Completion map:
>> > During master recovery, all incomplete requests will be enqueued and
>> > resumed
>> > from where they were left off. In addition an entry in a completion map
>> > containing  "session id+request id" --> "completion object" will be
>> > created.
>> > Clients will attempt to retry these requests (with a retry flag). When
>> > the
>> > master sees the retry flag it will check the completion map and wait for
>> > the
>> > completion object to signal that the request processing is complete. The
>> > master then looks up the request result (stored in BDB and possibly in
>> > the
>> > completion object) and sends it back to the client.
>> >
>> >
>> > -Sanjit
>> >
>> > >
>> >
>>
>>
>
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/hypertable-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to