Yes, Flavio there is no duplication of code. Fragment replication part I moved from BKAdmin to LedgerFragmentReplicator class.See initial patch at BK-299. This is just like a helper class for BKadmin for replicating Fragment. Directly I used this FragmentReplicator part for Replication Wroker.
Regards, Uma ________________________________________ From: Flavio Junqueira [[email protected]] Sent: Thursday, June 28, 2012 3:10 AM To: [email protected] Cc: Ivan Kelly; Rakesh R Subject: Re: Race condition between LedgerChecker and Ensemble reformation from client This discussion made me wonder about the relation between the bookie recovery tool and the auto-recovery feature. Does the latter replace the former? Also, if they share code, we want to avoid duplication, yes? -Flavio On Jun 27, 2012, at 4:17 PM, Uma Maheswara Rao G wrote: > Thanks Ivan and Flavio. > > I got the point. > > > > And Yes, I have seen this with below mentioned steps. Infact, I used only > very lower part of the code from BKAdmin.(Only fragment replication part) > That will not have this prventing steps because that is only responsible for > fragment. > > Need to build in ReplicationWorker. > > You mean that, > >> 1. If the failed bookie is not in the last ensemble of the ledger, > recover as normal. > fine. > > 2. If the failed bookie is in the last ensemble of the ledger, we > reopen the ledger using fencing. This stops the client from writing > any further entries to the ledger. Then recovery can continue as if > the ledger had already been closed. > This can make the NN to switch right? > > I think we should have some delay for replication work to trigger. Otherwise > every ensemble change may enable RW to fence the ledger right? Infact session > timeout should help here. though there is an other case where delay will not > help. Ledger already marked as UR bacause of some BK in previous enseble. > That can trigger RW to scan ledger to find fragments. In my case we are keep > shutting down the BKs and starting after some time. > > > > Regards, > > Uma > > ________________________________________ > From: Flavio Junqueira [[email protected]] > Sent: Wednesday, June 27, 2012 7:15 PM > To: [email protected] > Cc: Ivan Kelly; Rakesh R > Subject: Re: Race condition between LedgerChecker and Ensemble reformation > from client > > Hi Uma, Check the whole paragraph: Consequently, we restrict the recovery > tool to only perform changes to the metadata when the ledger is closed or > when the ledger writer has detected the bookie crash, has replaced it, and > reflected the change in the metadata. > > It is not only for closed ledgers. > > -Flavio > > On Jun 27, 2012, at 1:40 PM, Uma Maheswara Rao G wrote: > >> Thanks a lot, Flavio for reference. >> >> Here we are making use of RecoveryTool code. >> >> Also I have seen in the doc saying: >> Consequently, we restrict the recovery tool to only perform changes to the >> metadata when >> the ledger is closed >> >> >> >> In BOOKKEEPER-112 , Client is trying to handle this metadat failure case. >> But still there is a case it can not handle. >> >> Here is the case : >> >> When one BK failed from ensemble it will try to update the ensemble >> with new BK. >> >> >> >> CLIENT STEP 1: ex: 10 x y z -->10 x a z >> >> >> >> BETWEEN Step1 and Between Step2: >> >> At this stage , If RT runs, it may thing that there is missed entry, >> because a does not have the entry written yet. It may replace with new BK >> again by copying that missed entry. >> >> AutoRT updated ensemble ----> 10 x b z >> >> >> >> >> >> CLINET STEP2: And start writing the failed entry to pending BKs, >> unfortunately again it will try to update ensemble, but whatver ensemble >> knows by client is '10 x a z' >> >> >> >> Now metadata updation should fail as it got changed RT. >> >> >> >> In this case resolve conflicts obiously can not be solved. will be closed as >> >> 10 x b z >> >> 9 CLOSED >> >> >> >> Falvio, Ivan and Sijie What about your opinion on this case? >> >> >> >> >> >> Should be ok to skip OPENED ledgers? as standby will do rolling for every 2 >> mins. So, 2mins data may be in OPENED ledger. >> >> Let's check for other scenarios as well. >> >> >> >> >> >> Regards, >> >> Uma >> >> >> >> ________________________________________ >> From: Flavio Junqueira [[email protected]] >> Sent: Wednesday, June 27, 2012 12:15 PM >> To: [email protected] >> Cc: Ivan Kelly; Rakesh R >> Subject: Re: Race condition between LedgerChecker and Ensemble reformation >> from client >> >> Hi Uma, We have had a related issue in BOOKKEEPER-112 and there is a doc >> there describing how we deal with it. It might help to give it a look. >> >> -Flavio >> >> On Jun 27, 2012, at 7:06 AM, Uma Maheswara Rao G wrote: >> >>> Right. But Current Replication process considered for OPEN ledgers also. >>> So, Ledger checker can not know whether that ensemble is just reformed by >>> client or inprogress for write. >>> >>> One way is to skip the replication for Inprogress Ledgers. But Auditor may >>> need to recheck this opened ledgers periodically which ever it came across? >>> >>> IMO, replicating inrprogress ledgers may create some inconsistencies. >>> >>> Thanks, >>> Uma >>> ________________________________________ >>> From: Flavio Junqueira [[email protected]] >>> Sent: Wednesday, June 27, 2012 4:21 AM >>> To: [email protected] >>> Cc: Ivan Kelly; Rakesh R >>> Subject: Re: Race condition between LedgerChecker and Ensemble reformation >>> from client >>> >>> Hi Uma, It sounds like the replication worker shouldn't have written: >>> >>> 401 10.18.40.155:3181 10.18.40.155:3185 >>> 10.18.40.155:3184 >>> >>> If I'm not missing anything, the replication worker should update an >>> existing entry in the metadata, not create a new entry. >>> >>> -Flavio >>> >>> On Jun 26, 2012, at 6:07 PM, Uma Maheswara Rao G wrote: >>> >>>> Hi, >>>> >>>> It looks there is a race between LedgerChecker and Ensemble reformation >>>> from client. >>>> >>>> When one bookie failed from ensemble quoram, it will try to reform the >>>> ensemble on handleBookieFailure. >>>> >>>> At this time it is reforming the ensemble and resending the write request >>>> to new bookie (which is added into new ensemble.) >>>> >>>> At the same time if, If ReplicationWroker triggers on same ledger and run >>>> the LedgerChecker on it. >>>> LedgerChecker may find this last failed entry also as a fragment, because >>>> ensemble change already updated in metadata. >>>> >>>> If ReplicationWorker replicate this last fragment, then >>>> ChangeEnsembleCb#operationComplete will fail with Badversion, because >>>> ensemble data already updated by ReplicationWorker. >>>> >>>> >>>> LOG.error("Could not resolve ledger metadata conflict while changing >>>> ensemble to: " >>>> + newEnsemble + ", old >>>> meta data is \n" + new String(metadata.serialize()) >>>> + "\n, new meta data is >>>> \n" + new String(newMeta.serialize()) + "\n ,closing ledger"); >>>> >>>> 2012-06-23 10:51:47,814 - ERROR >>>> [main-EventThread:LedgerHandle$1ChangeEnsembleCb$1$1@714] - Could not >>>> resolve ledger metadata conflict while changing ensemble to: >>>> [/10.18.40.155:3182, /10.18.40.155:3185, /10.18.40.155:3184], old meta >>>> data is >>>> BookieMetadataFormatVersion 1 >>>> 2 >>>> 3 >>>> 0 >>>> 0 10.18.40.155:3181 10.18.40.155:3182 >>>> 10.18.40.155:3183 >>>> 102 10.18.40.155:3181 10.18.40.155:3185 >>>> 10.18.40.155:3183 >>>> , new meta data is >>>> BookieMetadataFormatVersion 1 >>>> 2 >>>> 3 >>>> 0 >>>> 0 10.18.40.155:3181 10.18.40.155:3182 >>>> 10.18.40.155:3183 >>>> 102 10.18.40.155:3181 10.18.40.155:3185 >>>> 10.18.40.155:3183 >>>> 401 10.18.40.155:3181 10.18.40.155:3185 >>>> 10.18.40.155:3184 >>>> ,closing ledger >>>> >>>> >>>> After this time, it will close the ledger. >>>> asyncCloseInternal(NoopCloseCallback.instance, null, rc); >>>> >>>> Then finally ledger metadata will looks like: >>>> >>>> 0 10.18.40.155:3181 10.18.40.155:3182 >>>> 10.18.40.155:3183 >>>> 102 10.18.40.155:3181 10.18.40.155:3185 >>>> 10.18.40.155:3183 >>>> 401 10.18.40.155:3181 10.18.40.155:3185 >>>> 10.18.40.155:3184 >>>> 400 CLOSED >>>> >>>> Because client known last succussful entry is 400. Am i missing some thing >>>> here? >>>> >>>> >>>> >>>> >>>> >>>> Regards, >>>> >>>> Uma >>>> >>>> >>>> >>>>
