Agreed on keep recovery tool until auto replication work stabilized and under production testing.
On Thu, Jun 28, 2012 at 10:44 PM, Uma Maheswara Rao G <[email protected]>wrote: > Hi Flavio, > > If AutoReplicationWork completes and stabilized, In the clusters where we > enabled autoRecovery will not require of recoveryTool any more. > > Since we are making autorecovery as optional currently, some users may > want to run that RecoveryTool ondemand and may not enable autoRecovery. > > Once we complete and stabilize the AutoReplicationWork, may be we can go > for vote? > > Regards, > Uma > > ________________________________________ > From: Flavio Junqueira [[email protected]] > Sent: Thursday, June 28, 2012 7:20 PM > To: [email protected] > Subject: Re: Race condition between LedgerChecker and Ensemble > reformation from client > > That's cool. I'm still wondering about the bookie recovery tool. Is there > still a need for such a tool or the replication scheme will supersede it? > What's your opinion? > > -Flavio > > On Jun 28, 2012, at 10:08 AM, Uma Maheswara Rao G wrote: > > > Yes, Flavio there is no duplication of code. > > Fragment replication part I moved from BKAdmin to > LedgerFragmentReplicator class.See initial patch at BK-299. > > This is just like a helper class for BKadmin for replicating Fragment. > Directly I used this FragmentReplicator part for Replication Wroker. > > > > Regards, > > Uma > > ________________________________________ > > From: Flavio Junqueira [[email protected]] > > Sent: Thursday, June 28, 2012 3:10 AM > > To: [email protected] > > Cc: Ivan Kelly; Rakesh R > > Subject: Re: Race condition between LedgerChecker and Ensemble > reformation from client > > > > This discussion made me wonder about the relation between the bookie > recovery tool and the auto-recovery feature. Does the latter replace the > former? Also, if they share code, we want to avoid duplication, yes? > > > > -Flavio > > > > On Jun 27, 2012, at 4:17 PM, Uma Maheswara Rao G wrote: > > > >> Thanks Ivan and Flavio. > >> > >> I got the point. > >> > >> > >> > >> And Yes, I have seen this with below mentioned steps. Infact, I used > only very lower part of the code from BKAdmin.(Only fragment replication > part) That will not have this prventing steps because that is only > responsible for fragment. > >> > >> Need to build in ReplicationWorker. > >> > >> You mean that, > >> > >>> 1. If the failed bookie is not in the last ensemble of the ledger, > >> recover as normal. > >> fine. > >> > >> 2. If the failed bookie is in the last ensemble of the ledger, we > >> reopen the ledger using fencing. This stops the client from writing > >> any further entries to the ledger. Then recovery can continue as if > >> the ledger had already been closed. > >> This can make the NN to switch right? > >> > >> I think we should have some delay for replication work to trigger. > Otherwise every ensemble change may enable RW to fence the ledger right? > Infact session timeout should help here. though there is an other case > where delay will not help. Ledger already marked as UR bacause of some BK > in previous enseble. That can trigger RW to scan ledger to find fragments. > In my case we are keep shutting down the BKs and starting after some time. > >> > >> > >> > >> Regards, > >> > >> Uma > >> > >> ________________________________________ > >> From: Flavio Junqueira [[email protected]] > >> Sent: Wednesday, June 27, 2012 7:15 PM > >> To: [email protected] > >> Cc: Ivan Kelly; Rakesh R > >> Subject: Re: Race condition between LedgerChecker and Ensemble > reformation from client > >> > >> Hi Uma, Check the whole paragraph: Consequently, we restrict the > recovery tool to only perform changes to the metadata when the ledger is > closed or when the ledger writer has detected the bookie crash, has > replaced it, and reflected the change in the metadata. > >> > >> It is not only for closed ledgers. > >> > >> -Flavio > >> > >> On Jun 27, 2012, at 1:40 PM, Uma Maheswara Rao G wrote: > >> > >>> Thanks a lot, Flavio for reference. > >>> > >>> Here we are making use of RecoveryTool code. > >>> > >>> Also I have seen in the doc saying: > >>> Consequently, we restrict the recovery tool to only perform changes to > the metadata when > >>> the ledger is closed > >>> > >>> > >>> > >>> In BOOKKEEPER-112 , Client is trying to handle this metadat failure > case. But still there is a case it can not handle. > >>> > >>> Here is the case : > >>> > >>> When one BK failed from ensemble it will try to update the > ensemble with new BK. > >>> > >>> > >>> > >>> CLIENT STEP 1: ex: 10 x y z -->10 x a z > >>> > >>> > >>> > >>> BETWEEN Step1 and Between Step2: > >>> > >>> At this stage , If RT runs, it may thing that there is missed entry, > because a does not have the entry written yet. It may replace with new BK > again by copying that missed entry. > >>> > >>> AutoRT updated ensemble ----> 10 x b z > >>> > >>> > >>> > >>> > >>> > >>> CLINET STEP2: And start writing the failed entry to pending BKs, > unfortunately again it will try to update ensemble, but whatver ensemble > knows by client is '10 x a z' > >>> > >>> > >>> > >>> Now metadata updation should fail as it got changed RT. > >>> > >>> > >>> > >>> In this case resolve conflicts obiously can not be solved. will be > closed as > >>> > >>> 10 x b z > >>> > >>> 9 CLOSED > >>> > >>> > >>> > >>> Falvio, Ivan and Sijie What about your opinion on this case? > >>> > >>> > >>> > >>> > >>> > >>> Should be ok to skip OPENED ledgers? as standby will do rolling for > every 2 mins. So, 2mins data may be in OPENED ledger. > >>> > >>> Let's check for other scenarios as well. > >>> > >>> > >>> > >>> > >>> > >>> Regards, > >>> > >>> Uma > >>> > >>> > >>> > >>> ________________________________________ > >>> From: Flavio Junqueira [[email protected]] > >>> Sent: Wednesday, June 27, 2012 12:15 PM > >>> To: [email protected] > >>> Cc: Ivan Kelly; Rakesh R > >>> Subject: Re: Race condition between LedgerChecker and Ensemble > reformation from client > >>> > >>> Hi Uma, We have had a related issue in BOOKKEEPER-112 and there is a > doc there describing how we deal with it. It might help to give it a look. > >>> > >>> -Flavio > >>> > >>> On Jun 27, 2012, at 7:06 AM, Uma Maheswara Rao G wrote: > >>> > >>>> Right. But Current Replication process considered for OPEN ledgers > also. So, Ledger checker can not know whether that ensemble is just > reformed by client or inprogress for write. > >>>> > >>>> One way is to skip the replication for Inprogress Ledgers. But > Auditor may need to recheck this opened ledgers periodically which ever it > came across? > >>>> > >>>> IMO, replicating inrprogress ledgers may create some inconsistencies. > >>>> > >>>> Thanks, > >>>> Uma > >>>> ________________________________________ > >>>> From: Flavio Junqueira [[email protected]] > >>>> Sent: Wednesday, June 27, 2012 4:21 AM > >>>> To: [email protected] > >>>> Cc: Ivan Kelly; Rakesh R > >>>> Subject: Re: Race condition between LedgerChecker and Ensemble > reformation from client > >>>> > >>>> Hi Uma, It sounds like the replication worker shouldn't have written: > >>>> > >>>> 401 10.18.40.155:3181 10.18.40.155:3185 > 10.18.40.155:3184 > >>>> > >>>> If I'm not missing anything, the replication worker should update an > existing entry in the metadata, not create a new entry. > >>>> > >>>> -Flavio > >>>> > >>>> On Jun 26, 2012, at 6:07 PM, Uma Maheswara Rao G wrote: > >>>> > >>>>> Hi, > >>>>> > >>>>> It looks there is a race between LedgerChecker and Ensemble > reformation from client. > >>>>> > >>>>> When one bookie failed from ensemble quoram, it will try to reform > the ensemble on handleBookieFailure. > >>>>> > >>>>> At this time it is reforming the ensemble and resending the write > request to new bookie (which is added into new ensemble.) > >>>>> > >>>>> At the same time if, If ReplicationWroker triggers on same ledger > and run the LedgerChecker on it. > >>>>> LedgerChecker may find this last failed entry also as a fragment, > because ensemble change already updated in metadata. > >>>>> > >>>>> If ReplicationWorker replicate this last fragment, then > ChangeEnsembleCb#operationComplete will fail with Badversion, because > ensemble data already updated by ReplicationWorker. > >>>>> > >>>>> > >>>>> LOG.error("Could not resolve ledger metadata conflict while changing > ensemble to: " > >>>>> + newEnsemble + ", > old meta data is \n" + new String(metadata.serialize()) > >>>>> + "\n, new meta > data is \n" + new String(newMeta.serialize()) + "\n ,closing ledger"); > >>>>> > >>>>> 2012-06-23 10:51:47,814 - ERROR > [main-EventThread:LedgerHandle$1ChangeEnsembleCb$1$1@714] - Could not > resolve ledger metadata conflict while changing ensemble to: [/ > 10.18.40.155:3182, /10.18.40.155:3185, /10.18.40.155:3184], old meta data > is > >>>>> BookieMetadataFormatVersion 1 > >>>>> 2 > >>>>> 3 > >>>>> 0 > >>>>> 0 10.18.40.155:3181 10.18.40.155:3182 > 10.18.40.155:3183 > >>>>> 102 10.18.40.155:3181 10.18.40.155:3185 > 10.18.40.155:3183 > >>>>> , new meta data is > >>>>> BookieMetadataFormatVersion 1 > >>>>> 2 > >>>>> 3 > >>>>> 0 > >>>>> 0 10.18.40.155:3181 10.18.40.155:3182 > 10.18.40.155:3183 > >>>>> 102 10.18.40.155:3181 10.18.40.155:3185 > 10.18.40.155:3183 > >>>>> 401 10.18.40.155:3181 10.18.40.155:3185 > 10.18.40.155:3184 > >>>>> ,closing ledger > >>>>> > >>>>> > >>>>> After this time, it will close the ledger. > asyncCloseInternal(NoopCloseCallback.instance, null, rc); > >>>>> > >>>>> Then finally ledger metadata will looks like: > >>>>> > >>>>> 0 10.18.40.155:3181 10.18.40.155:3182 > 10.18.40.155:3183 > >>>>> 102 10.18.40.155:3181 10.18.40.155:3185 > 10.18.40.155:3183 > >>>>> 401 10.18.40.155:3181 10.18.40.155:3185 > 10.18.40.155:3184 > >>>>> 400 CLOSED > >>>>> > >>>>> Because client known last succussful entry is 400. Am i missing some > thing here? > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> Regards, > >>>>> > >>>>> Uma > >>>>> > >>>>> > >>>>> > >>>>> >
