Hi, Thank you. We are trying to buid a dataservice for a special application. in this stage , we are trying to understand how the code of HA works.
Thanks BestRegards Edward ----- Original Message ----- From: Binu Jose Philip To: Brilliant Cc: ha-clusters-discuss at opensolaris.org Sent: Thursday, December 04, 2008 8:20 PM Subject: Re: [ha-clusters-discuss] Replica Process discussion On Wed, Dec 3, 2008 at 1:39 AM, Tim Read - Staff Engineer Solaris Availability Engineering <Tim.Read at sun.com> wrote: > Edward, > > I'm not able to provide any more information here as this involved > sections of the code that I am not familiar with. > > I will try and get one of my more knowledgeable colleague to post an answer. With the disclaimer that I am not that knowledgeable colleague ;) Your understanding of what a replica server does matches mine. Unless there is a failover and retry the client never contacts the secondary. My understanding of the steps is as follows. - client sends request to primary - primary checkpoints to secondary - transaction open, transaction has state - primary performs operations - another checkpoint if needed - transaction has more state * primary replies to client * client objects created for the invocation sends un-reference and thus signals primary that invocation is complete - primary closes transaction (needs confirmation) or * primary dies before reply to client * client retries on new primary ie. old secondary - transactions replayed before invocation is accepted * new primary sends reply to client * client un-reference causes new primary to close and commit transaction (needs confirmation) A commit could also be forced if your replicated service was designed that way. IIRC the global mount subsystem does that. > the checkpoint struction in code shows it contain > replica::ckpt_seq_t minseq; > replica::ckpt_seq_t value; > So I am wonder how to use this two values? > --I don't know how the value means . and what is recorded in the checkpoint. The minseq and value together make sure that checkpoints are replayed on the secondary or the new primary in the correct sequence. Two counters are needed to make sure wraparound of value is handled correctly. This code snippet should make that clear. http://src.opensolaris.org/source/xref/ohac/ohac/usr/src/common/cl/repl/service/multi_ckpt_handler_in.h#52 Could I ask whether you are interested in writing a first class HA-Application or are you trying to understand the code? I asked since even though I have been working on PxFS for some time, the need to understand check-points and transactions were far and few. PxFS and mount account for almost all checkpoint activity on a cluster. I am still able to get by although that is no excuse. cheers Binu > Regards, > > Tim > --- > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/ha-clusters-discuss/attachments/20081205/f911cf4b/attachment.html>