RE: Race condition between LedgerChecker and Ensemble reformation from client

Uma Maheswara Rao G Fri, 29 Jun 2012 12:08:04 -0700

Yes, You are right Ivan.

Replication worker will reread the metadata. In replication case, we are not 
doing any sanity checks, it is directly taking new ensemble and continuing. 
Should be ok.


In WriteCB of lh#writeLedgerConfig method.

lh.rereadMetadata(new GenericCallback<LedgerMetadata>() {
                        @Override
                        public void operationComplete(int rc, LedgerMetadata 
newMeta) {
                            if (rc != BKException.Code.OK) {
                                LOG.error("Error reading updated ledger 
metadata for ledger " + lh.getId());
                                ledgerFragmentsMcb.processResult(rc, null, 
null);
                            } else {
                                lh.metadata = newMeta;
                                writeLedgerConfig();
                            }
                        }
                    });


>below point is from your previous post.
>
>>2. If the failed bookie is in the last ensemble of the ledger, we
>>reopen the ledger using fencing. This stops the client from writing
>>any further entries to the ledger. Then recovery can continue as if
>>the ledger had already been closed.
>>>>>>>>>>>>>>>>>>>>>>
How failed BK present in last ensemble?  Only one case i can see is, when 
multiple BK failures and ensemble formation in inprogress ( 1/2 times failed 
the bookies while writing the same entry). Within this window, RW may trigger 
and find fragment as underreplicated as I explained in my previous post. If my 
understanding is correct here, how about delaying the replication for this last 
fragment and retry after some time?  because client will have the scope to 
change the ensemble on next entry if it is alive. So, after that delay this 
fragment would not be last fragment more. 

Because, I am bit worrying about fencing at this situation, and it will cause 
unnecessary Namenode switch.


Regards,
Uma
________________________________________
From: Ivan Kelly [[email protected]]
Sent: Friday, June 29, 2012 10:11 PM
To: [email protected]
Subject: Re: Race condition between  LedgerChecker and Ensemble reformation 
from client

> I'm thinking, any bookie failure in the inprogress ledger will enter into the 
> race situation, not only the last ensemble of the ledger
>
> Consider the example of the following open/inprogress ledger:-
> L00001
> 0   - A B C
> 10 - A B D
> 11 - A B E
> Say the ReplicationWorker(RW) has chosen this ledger L00001 to recover. Now 
> assume D has rejoined, only C is not running.
> So the RW will re-replicate and update the metadata. This will leads to the 
> race condition as we ended up with two writers for the same ledger L00001 and 
> cause BadVersion Exception to the actual writer bk client. Eventhough we are 
> rereading and checking metadata.resolveConflict(), this will find data 
> mismatch. Finally fails the bkclient.
>
> I general, what I understood is any updation to the inprogress ledger by the 
> RW would result in BadVersionException to the client and resulting in NN 
> switching.
>
> Also, an ensemble reformation of an inprogress ledger by the bkclient (actual 
> writer) would cause BadVersionException to the ReplicationWorker side.
> I think, we need to consider this case while designing the ReplicationWorker 
> thread.

Both these cases should be fine. Both handleBookieFailure and CloseOp
will retry if they see the ledger metadata has been updated. For
ReplicationWorker, i assume it'll use the mechanism which is in
BookKeeperAdmin now. This also rereads and retries if it gets a bad
version exception.

resolveConflict will only find a data mismatch if the ensemble start
entry has changed, not if the configuration of the ensemble has
changed.

-Ivan

RE: Race condition between LedgerChecker and Ensemble reformation from client

Reply via email to