That's cool. I'm still wondering about the bookie recovery tool. Is there still a need for such a tool or the replication scheme will supersede it? What's your opinion?
-Flavio On Jun 28, 2012, at 10:08 AM, Uma Maheswara Rao G wrote: > Yes, Flavio there is no duplication of code. > Fragment replication part I moved from BKAdmin to LedgerFragmentReplicator > class.See initial patch at BK-299. > This is just like a helper class for BKadmin for replicating Fragment. > Directly I used this FragmentReplicator part for Replication Wroker. > > Regards, > Uma > ________________________________________ > From: Flavio Junqueira [[email protected]] > Sent: Thursday, June 28, 2012 3:10 AM > To: [email protected] > Cc: Ivan Kelly; Rakesh R > Subject: Re: Race condition between LedgerChecker and Ensemble reformation > from client > > This discussion made me wonder about the relation between the bookie recovery > tool and the auto-recovery feature. Does the latter replace the former? Also, > if they share code, we want to avoid duplication, yes? > > -Flavio > > On Jun 27, 2012, at 4:17 PM, Uma Maheswara Rao G wrote: > >> Thanks Ivan and Flavio. >> >> I got the point. >> >> >> >> And Yes, I have seen this with below mentioned steps. Infact, I used only >> very lower part of the code from BKAdmin.(Only fragment replication part) >> That will not have this prventing steps because that is only responsible for >> fragment. >> >> Need to build in ReplicationWorker. >> >> You mean that, >> >>> 1. If the failed bookie is not in the last ensemble of the ledger, >> recover as normal. >> fine. >> >> 2. If the failed bookie is in the last ensemble of the ledger, we >> reopen the ledger using fencing. This stops the client from writing >> any further entries to the ledger. Then recovery can continue as if >> the ledger had already been closed. >> This can make the NN to switch right? >> >> I think we should have some delay for replication work to trigger. Otherwise >> every ensemble change may enable RW to fence the ledger right? Infact >> session timeout should help here. though there is an other case where delay >> will not help. Ledger already marked as UR bacause of some BK in previous >> enseble. That can trigger RW to scan ledger to find fragments. In my case we >> are keep shutting down the BKs and starting after some time. >> >> >> >> Regards, >> >> Uma >> >> ________________________________________ >> From: Flavio Junqueira [[email protected]] >> Sent: Wednesday, June 27, 2012 7:15 PM >> To: [email protected] >> Cc: Ivan Kelly; Rakesh R >> Subject: Re: Race condition between LedgerChecker and Ensemble reformation >> from client >> >> Hi Uma, Check the whole paragraph: Consequently, we restrict the recovery >> tool to only perform changes to the metadata when the ledger is closed or >> when the ledger writer has detected the bookie crash, has replaced it, and >> reflected the change in the metadata. >> >> It is not only for closed ledgers. >> >> -Flavio >> >> On Jun 27, 2012, at 1:40 PM, Uma Maheswara Rao G wrote: >> >>> Thanks a lot, Flavio for reference. >>> >>> Here we are making use of RecoveryTool code. >>> >>> Also I have seen in the doc saying: >>> Consequently, we restrict the recovery tool to only perform changes to the >>> metadata when >>> the ledger is closed >>> >>> >>> >>> In BOOKKEEPER-112 , Client is trying to handle this metadat failure case. >>> But still there is a case it can not handle. >>> >>> Here is the case : >>> >>> When one BK failed from ensemble it will try to update the ensemble >>> with new BK. >>> >>> >>> >>> CLIENT STEP 1: ex: 10 x y z -->10 x a z >>> >>> >>> >>> BETWEEN Step1 and Between Step2: >>> >>> At this stage , If RT runs, it may thing that there is missed entry, >>> because a does not have the entry written yet. It may replace with new BK >>> again by copying that missed entry. >>> >>> AutoRT updated ensemble ----> 10 x b z >>> >>> >>> >>> >>> >>> CLINET STEP2: And start writing the failed entry to pending BKs, >>> unfortunately again it will try to update ensemble, but whatver ensemble >>> knows by client is '10 x a z' >>> >>> >>> >>> Now metadata updation should fail as it got changed RT. >>> >>> >>> >>> In this case resolve conflicts obiously can not be solved. will be closed as >>> >>> 10 x b z >>> >>> 9 CLOSED >>> >>> >>> >>> Falvio, Ivan and Sijie What about your opinion on this case? >>> >>> >>> >>> >>> >>> Should be ok to skip OPENED ledgers? as standby will do rolling for every 2 >>> mins. So, 2mins data may be in OPENED ledger. >>> >>> Let's check for other scenarios as well. >>> >>> >>> >>> >>> >>> Regards, >>> >>> Uma >>> >>> >>> >>> ________________________________________ >>> From: Flavio Junqueira [[email protected]] >>> Sent: Wednesday, June 27, 2012 12:15 PM >>> To: [email protected] >>> Cc: Ivan Kelly; Rakesh R >>> Subject: Re: Race condition between LedgerChecker and Ensemble reformation >>> from client >>> >>> Hi Uma, We have had a related issue in BOOKKEEPER-112 and there is a doc >>> there describing how we deal with it. It might help to give it a look. >>> >>> -Flavio >>> >>> On Jun 27, 2012, at 7:06 AM, Uma Maheswara Rao G wrote: >>> >>>> Right. But Current Replication process considered for OPEN ledgers also. >>>> So, Ledger checker can not know whether that ensemble is just reformed by >>>> client or inprogress for write. >>>> >>>> One way is to skip the replication for Inprogress Ledgers. But Auditor may >>>> need to recheck this opened ledgers periodically which ever it came across? >>>> >>>> IMO, replicating inrprogress ledgers may create some inconsistencies. >>>> >>>> Thanks, >>>> Uma >>>> ________________________________________ >>>> From: Flavio Junqueira [[email protected]] >>>> Sent: Wednesday, June 27, 2012 4:21 AM >>>> To: [email protected] >>>> Cc: Ivan Kelly; Rakesh R >>>> Subject: Re: Race condition between LedgerChecker and Ensemble >>>> reformation from client >>>> >>>> Hi Uma, It sounds like the replication worker shouldn't have written: >>>> >>>> 401 10.18.40.155:3181 10.18.40.155:3185 >>>> 10.18.40.155:3184 >>>> >>>> If I'm not missing anything, the replication worker should update an >>>> existing entry in the metadata, not create a new entry. >>>> >>>> -Flavio >>>> >>>> On Jun 26, 2012, at 6:07 PM, Uma Maheswara Rao G wrote: >>>> >>>>> Hi, >>>>> >>>>> It looks there is a race between LedgerChecker and Ensemble reformation >>>>> from client. >>>>> >>>>> When one bookie failed from ensemble quoram, it will try to reform the >>>>> ensemble on handleBookieFailure. >>>>> >>>>> At this time it is reforming the ensemble and resending the write request >>>>> to new bookie (which is added into new ensemble.) >>>>> >>>>> At the same time if, If ReplicationWroker triggers on same ledger and run >>>>> the LedgerChecker on it. >>>>> LedgerChecker may find this last failed entry also as a fragment, because >>>>> ensemble change already updated in metadata. >>>>> >>>>> If ReplicationWorker replicate this last fragment, then >>>>> ChangeEnsembleCb#operationComplete will fail with Badversion, because >>>>> ensemble data already updated by ReplicationWorker. >>>>> >>>>> >>>>> LOG.error("Could not resolve ledger metadata conflict while changing >>>>> ensemble to: " >>>>> + newEnsemble + ", old >>>>> meta data is \n" + new String(metadata.serialize()) >>>>> + "\n, new meta data is >>>>> \n" + new String(newMeta.serialize()) + "\n ,closing ledger"); >>>>> >>>>> 2012-06-23 10:51:47,814 - ERROR >>>>> [main-EventThread:LedgerHandle$1ChangeEnsembleCb$1$1@714] - Could not >>>>> resolve ledger metadata conflict while changing ensemble to: >>>>> [/10.18.40.155:3182, /10.18.40.155:3185, /10.18.40.155:3184], old meta >>>>> data is >>>>> BookieMetadataFormatVersion 1 >>>>> 2 >>>>> 3 >>>>> 0 >>>>> 0 10.18.40.155:3181 10.18.40.155:3182 >>>>> 10.18.40.155:3183 >>>>> 102 10.18.40.155:3181 10.18.40.155:3185 >>>>> 10.18.40.155:3183 >>>>> , new meta data is >>>>> BookieMetadataFormatVersion 1 >>>>> 2 >>>>> 3 >>>>> 0 >>>>> 0 10.18.40.155:3181 10.18.40.155:3182 >>>>> 10.18.40.155:3183 >>>>> 102 10.18.40.155:3181 10.18.40.155:3185 >>>>> 10.18.40.155:3183 >>>>> 401 10.18.40.155:3181 10.18.40.155:3185 >>>>> 10.18.40.155:3184 >>>>> ,closing ledger >>>>> >>>>> >>>>> After this time, it will close the ledger. >>>>> asyncCloseInternal(NoopCloseCallback.instance, null, rc); >>>>> >>>>> Then finally ledger metadata will looks like: >>>>> >>>>> 0 10.18.40.155:3181 10.18.40.155:3182 >>>>> 10.18.40.155:3183 >>>>> 102 10.18.40.155:3181 10.18.40.155:3185 >>>>> 10.18.40.155:3183 >>>>> 401 10.18.40.155:3181 10.18.40.155:3185 >>>>> 10.18.40.155:3184 >>>>> 400 CLOSED >>>>> >>>>> Because client known last succussful entry is 400. Am i missing some >>>>> thing here? >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Regards, >>>>> >>>>> Uma >>>>> >>>>> >>>>> >>>>>
