Hi,
Thank you.
We are trying to buid a dataservice for a special application.
in this stage , we are trying to understand how the code of HA works.
Thanks
BestRegards
Edward
----- Original Message -----
From: Binu Jose Philip
To: Brilliant
Cc: ha-clusters-discuss at opensolaris.org
Sent: Thursday, December 04, 2008 8:20 PM
Subject: Re: [ha-clusters-discuss] Replica Process discussion
On Wed, Dec 3, 2008 at 1:39 AM, Tim Read - Staff Engineer Solaris
Availability Engineering <Tim.Read at sun.com> wrote:
> Edward,
>
> I'm not able to provide any more information here as this involved
> sections of the code that I am not familiar with.
>
> I will try and get one of my more knowledgeable colleague to post an answer.
With the disclaimer that I am not that knowledgeable colleague ;)
Your understanding of what a replica server does matches mine. Unless
there is a failover and retry the client never contacts the secondary. My
understanding of the steps is as follows.
- client sends request to primary
- primary checkpoints to secondary - transaction open, transaction has state
- primary performs operations
- another checkpoint if needed - transaction has more state
* primary replies to client
* client objects created for the invocation sends un-reference and thus
signals primary that invocation is complete - primary closes transaction
(needs confirmation)
or
* primary dies before reply to client
* client retries on new primary ie. old secondary - transactions replayed
before invocation is accepted
* new primary sends reply to client
* client un-reference causes new primary to close and commit transaction
(needs confirmation)
A commit could also be forced if your replicated service was designed
that way. IIRC the global mount subsystem does that.
> the checkpoint struction in code shows it contain
> replica::ckpt_seq_t minseq;
> replica::ckpt_seq_t value;
> So I am wonder how to use this two values?
> --I don't know how the value means . and what is recorded in the checkpoint.
The minseq and value together make sure that checkpoints are replayed on
the secondary or the new primary in the correct sequence. Two counters are
needed to make sure wraparound of value is handled correctly.
This code snippet should make that clear.
http://src.opensolaris.org/source/xref/ohac/ohac/usr/src/common/cl/repl/service/multi_ckpt_handler_in.h#52
Could I ask whether you are interested in writing a first class HA-Application
or are you trying to understand the code? I asked since even though I have
been working on PxFS for some time, the need to understand check-points and
transactions were far and few. PxFS and mount account for almost all
checkpoint
activity on a cluster. I am still able to get by although that is no excuse.
cheers
Binu
> Regards,
>
> Tim
> ---
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/ha-clusters-discuss/attachments/20081205/f911cf4b/attachment.html>