Hi,
Thank you.
We are trying to buid a dataservice for a special application.
in this stage , we are trying to understand how the code of HA works.


Thanks
BestRegards
Edward
  ----- Original Message ----- 
  From: Binu Jose Philip 
  To: Brilliant 
  Cc: ha-clusters-discuss at opensolaris.org 
  Sent: Thursday, December 04, 2008 8:20 PM
  Subject: Re: [ha-clusters-discuss] Replica Process discussion


  On Wed, Dec 3, 2008 at 1:39 AM, Tim Read - Staff Engineer Solaris
  Availability Engineering <Tim.Read at sun.com> wrote:
  > Edward,
  >
  > I'm not able to provide any more information here as this involved
  > sections of the code that I am not familiar with.
  >
  > I will try and get one of my more knowledgeable colleague to post an answer.

  With the disclaimer that I am not that knowledgeable colleague ;)

  Your understanding of what a replica server does matches mine. Unless
  there is a failover and retry the client never contacts the secondary. My
  understanding of the steps is as follows.

  - client sends request to primary
  - primary checkpoints to secondary - transaction open, transaction has state
  - primary performs operations
  - another checkpoint if needed - transaction has more state

  * primary replies to client
  * client objects created for the invocation sends un-reference and thus
    signals primary that invocation is complete  - primary closes transaction
  (needs confirmation)

  or

  * primary dies before reply to client
  * client retries on new primary ie. old secondary - transactions replayed
    before invocation is accepted
  * new primary sends reply to client
  * client un-reference causes new primary to close and commit transaction
  (needs confirmation)

  A commit could also be forced if your replicated service was designed
  that way. IIRC the global mount subsystem does that.

  > the checkpoint struction in code shows it contain
  > replica::ckpt_seq_t minseq;
  > replica::ckpt_seq_t value;
  > So I am wonder how to use this two values?
  > --I don't know how the value means . and what is recorded in the checkpoint.

  The minseq and value together make sure that checkpoints are replayed on
  the secondary or the new primary in the correct sequence. Two counters are
  needed to make sure wraparound of value is handled correctly.

  This code snippet should make that clear.
  
http://src.opensolaris.org/source/xref/ohac/ohac/usr/src/common/cl/repl/service/multi_ckpt_handler_in.h#52

  Could I ask whether you are interested in writing a first class HA-Application
  or are you trying to understand the code? I asked since even though I have
  been working on PxFS for some time, the need to understand check-points and
  transactions were far and few. PxFS and mount account for almost all 
checkpoint
  activity on a cluster. I am still able to get by although that is no excuse.

  cheers
  Binu

  > Regards,
  >
  > Tim
  > ---
  >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://mail.opensolaris.org/pipermail/ha-clusters-discuss/attachments/20081205/f911cf4b/attachment.html>

Reply via email to