On Wed, Dec 3, 2008 at 1:39 AM, Tim Read - Staff Engineer Solaris
Availability Engineering <Tim.Read at sun.com> wrote:
> Edward,
>
> I'm not able to provide any more information here as this involved
> sections of the code that I am not familiar with.
>
> I will try and get one of my more knowledgeable colleague to post an answer.

With the disclaimer that I am not that knowledgeable colleague ;)

Your understanding of what a replica server does matches mine. Unless
there is a failover and retry the client never contacts the secondary. My
understanding of the steps is as follows.

- client sends request to primary
- primary checkpoints to secondary - transaction open, transaction has state
- primary performs operations
- another checkpoint if needed - transaction has more state

* primary replies to client
* client objects created for the invocation sends un-reference and thus
  signals primary that invocation is complete  - primary closes transaction
(needs confirmation)

or

* primary dies before reply to client
* client retries on new primary ie. old secondary - transactions replayed
  before invocation is accepted
* new primary sends reply to client
* client un-reference causes new primary to close and commit transaction
(needs confirmation)

A commit could also be forced if your replicated service was designed
that way. IIRC the global mount subsystem does that.

> the checkpoint struction in code shows it contain
> replica::ckpt_seq_t minseq;
> replica::ckpt_seq_t value;
> So I am wonder how to use this two values?
> --I don't know how the value means . and what is recorded in the checkpoint.

The minseq and value together make sure that checkpoints are replayed on
the secondary or the new primary in the correct sequence. Two counters are
needed to make sure wraparound of value is handled correctly.

This code snippet should make that clear.
http://src.opensolaris.org/source/xref/ohac/ohac/usr/src/common/cl/repl/service/multi_ckpt_handler_in.h#52

Could I ask whether you are interested in writing a first class HA-Application
or are you trying to understand the code? I asked since even though I have
been working on PxFS for some time, the need to understand check-points and
transactions were far and few. PxFS and mount account for almost all checkpoint
activity on a cluster. I am still able to get by although that is no excuse.

cheers
Binu

> Regards,
>
> Tim
> ---
>
>
> On 12/03/08 00:44, Brilliant wrote:
>> Hi Tim,
>> Thank you first.
>> according to your answer for my questions, I have some other idea about
>> them , Please see it bellow.
>> Thanks
>> Edward
>>
>>     ----- Original Message -----
>>     *From:* Tim Read - Staff Engineer Solaris Availability Engineering
>>     <mailto:Tim.Read at Sun.COM>
>>     *To:* yang <mailto:yanggongming at huawei.com>
>>     *Cc:* ha-clusters-discuss at opensolaris.org
>>     <mailto:ha-clusters-discuss at opensolaris.org>
>>     *Sent:* Tuesday, December 02, 2008 10:08 PM
>>     *Subject:* Re: [ha-clusters-discuss] Replica Process discussion
>>
>>     Edward,
>>
>>     One of my colleagues managed to track down the paper. It turns out that
>>     the diagram is the same as the one in the Blueprint that Richard Elling
>>     and I wrote! Having said that, I got it original from some of the
>>     internal design documents created by the subsequent authors of the code.
>>
>>     More answers inline, though I'll have to leave it to some of my
>>     colleagues who know more about the details of these internals to
>>     provide
>>     more detail.
>>
>>     Regards,
>>
>>     Tim
>>     ---
>>
>>     On 12/02/08 02:20, yang wrote:
>>      > Hi all,
>>      > The "sun cluster white paper" have a picture in the desicription.
>>     I have some question,from the picture , I can't figure out what is
>>     the process of replica.
>>      > question1: in step 1, what is the "transfer commit"of"request +
>>     transfer commit" means, and what is the "+" means.
>>
>>     The '+' simply means 'and'. So this is "request and transfer commit".
>>     ---> Why the client need transfer commit, what is the 'transfer
>>     commit' for? it is the first request msg.
>>            what is the 'transfer commit' mean , does this different from
>>     'commit' means.
>>            I perfer to explian the + transfer with " this message need
>>     commit  from remote endpoint"
>>
>>      > question2: in step 6,why use the dot line and send confirm to
>>     secondary? in my mind , the client should send it to primary
>>     ,because secondary is offline from the client's aspect.
>>
>>     If I remember correctly, the solid lines are synchronous operations,
>>     the
>>     dotted lines are asynchronous operations. So the step 6 is to
>>     confirm to
>>     the secondary that the transaction has been complete and they don't
>>     need
>>     to 'remember' anything about it. This would free up any memory held
>>     I guess.
>>
>>     --> The Client don't know any thing about the secondary Application.
>>     and why the client need to send commit?
>>     I think the client receive the commit request, then the client will
>>     send commit to primary.
>>     and primary will send a copy to secondary.So this msg need the
>>     server and the replica have interaction.
>>     I still don't know why call the message "fogot msg"?
>>
>>      > question3:in the step 5, why the primary need "+ confirm" ?
>>     replay means confirm already.
>>
>>     The reply from the primary to the client is the 'commit'. It is
>>     confirming that the transaction has been completed.
>>     ---> the transaction isn't finished,because after receive the commit
>>     from secondary,it do force log and write disk. so i still think +
>>     means need a confirm or other.
>>     and from the code , the need confirm is taken from the message received.
>>
>>      > based on my understanding, i think the replica will work like this:
>>      > 1/client send a request to primary.(+transfer commit means need
>>     commit,yet i don't know why transfer commit yet not only commit?)
>>      > 2/primary do checkpoint and then send it to secondary.
>>      > 3/secondary receive checkpoint and make confirm to primary
>>      > 4/primary receive commit and then continue to response to the
>>     reply, and keep the force log for the steps.
>>      > 5/after finish , send client a reply with a flag to tell client
>>     to response the confirm to himself or secondary(no link to
>>     primary,do primary don't need the confirm. how client send confirm
>>     to secondary ,and secondary is offline,so why send to secondary?)
>>      >
>>      > In my mind , I think this picture only fit for one special kind
>>     of service.yet I really do need to know whether the process is right
>>     and what is the impaction from the picture should be.
>>      >
>>      > the checkpoint struction in code shows it contain
>>      > replica::ckpt_seq_t minseq;
>>      > replica::ckpt_seq_t value;
>>      > So I am wonder how to use this two values?
>>     --I don't know how the value means . and what is recorded in the
>>     checkpoint.
>>
>>      > Thanks
>>      > Edward
>>
>>     --
>>
>>     Tim Read
>>     Staff Engineer
>>     Solaris Availability Engineering
>>     Sun Microsystems Ltd
>>     Springfield
>>     Linlithgow
>>     EH49 7LR
>>
>>     Phone: +44 (0)1506 672 684
>>     Mobile: +44 (0)7802 212 137
>>
>>     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>>     NOTICE: This email message is for the sole use of the intended
>>     recipient(s) and may contain confidential and privileged information.
>>     Any unauthorized review, use, disclosure or distribution is prohibited.
>>     If you are not the intended recipient, please contact the sender by
>>     reply email and destroy all copies of the original message.
>>
>>     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> --
>
> Tim Read
> Staff Engineer
> Solaris Availability Engineering
> Sun Microsystems Ltd
> Springfield
> Linlithgow
> EH49 7LR
>
> Phone: +44 (0)1506 672 684
> Mobile: +44 (0)7802 212 137
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> NOTICE: This email message is for the sole use of the intended
> recipient(s) and may contain confidential and privileged information.
> Any unauthorized review, use, disclosure or distribution is prohibited.
> If you are not the intended recipient, please contact the sender by
> reply email and destroy all copies of the original message.
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> _______________________________________________
> ha-clusters-discuss mailing list
> ha-clusters-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss
>

Reply via email to