Yes, You are right Ivan.
Replication worker will reread the metadata. In replication case, we are not
doing any sanity checks, it is directly taking new ensemble and continuing.
Should be ok.
In WriteCB of lh#writeLedgerConfig method.
lh.rereadMetadata(new GenericCallback<LedgerMetadata>() {
@Override
public void operationComplete(int rc, LedgerMetadata
newMeta) {
if (rc != BKException.Code.OK) {
LOG.error("Error reading updated ledger
metadata for ledger " + lh.getId());
ledgerFragmentsMcb.processResult(rc, null,
null);
} else {
lh.metadata = newMeta;
writeLedgerConfig();
}
}
});
>below point is from your previous post.
>
>>2. If the failed bookie is in the last ensemble of the ledger, we
>>reopen the ledger using fencing. This stops the client from writing
>>any further entries to the ledger. Then recovery can continue as if
>>the ledger had already been closed.
>>>>>>>>>>>>>>>>>>>>>>
How failed BK present in last ensemble? Only one case i can see is, when
multiple BK failures and ensemble formation in inprogress ( 1/2 times failed
the bookies while writing the same entry). Within this window, RW may trigger
and find fragment as underreplicated as I explained in my previous post. If my
understanding is correct here, how about delaying the replication for this last
fragment and retry after some time? because client will have the scope to
change the ensemble on next entry if it is alive. So, after that delay this
fragment would not be last fragment more.
Because, I am bit worrying about fencing at this situation, and it will cause
unnecessary Namenode switch.
Regards,
Uma
________________________________________
From: Ivan Kelly [[email protected]]
Sent: Friday, June 29, 2012 10:11 PM
To: [email protected]
Subject: Re: Race condition between LedgerChecker and Ensemble reformation
from client
> I'm thinking, any bookie failure in the inprogress ledger will enter into the
> race situation, not only the last ensemble of the ledger
>
> Consider the example of the following open/inprogress ledger:-
> L00001
> 0 - A B C
> 10 - A B D
> 11 - A B E
> Say the ReplicationWorker(RW) has chosen this ledger L00001 to recover. Now
> assume D has rejoined, only C is not running.
> So the RW will re-replicate and update the metadata. This will leads to the
> race condition as we ended up with two writers for the same ledger L00001 and
> cause BadVersion Exception to the actual writer bk client. Eventhough we are
> rereading and checking metadata.resolveConflict(), this will find data
> mismatch. Finally fails the bkclient.
>
> I general, what I understood is any updation to the inprogress ledger by the
> RW would result in BadVersionException to the client and resulting in NN
> switching.
>
> Also, an ensemble reformation of an inprogress ledger by the bkclient (actual
> writer) would cause BadVersionException to the ReplicationWorker side.
> I think, we need to consider this case while designing the ReplicationWorker
> thread.
Both these cases should be fine. Both handleBookieFailure and CloseOp
will retry if they see the ledger metadata has been updated. For
ReplicationWorker, i assume it'll use the mechanism which is in
BookKeeperAdmin now. This also rereads and retries if it gets a bad
version exception.
resolveConflict will only find a data mismatch if the ensemble start
entry has changed, not if the configuration of the ensemble has
changed.
-Ivan