Hi Uma, We have had a related issue in BOOKKEEPER-112 and there is a doc there describing how we deal with it. It might help to give it a look.
-Flavio On Jun 27, 2012, at 7:06 AM, Uma Maheswara Rao G wrote: > Right. But Current Replication process considered for OPEN ledgers also. So, > Ledger checker can not know whether that ensemble is just reformed by client > or inprogress for write. > > One way is to skip the replication for Inprogress Ledgers. But Auditor may > need to recheck this opened ledgers periodically which ever it came across? > > IMO, replicating inrprogress ledgers may create some inconsistencies. > > Thanks, > Uma > ________________________________________ > From: Flavio Junqueira [[email protected]] > Sent: Wednesday, June 27, 2012 4:21 AM > To: [email protected] > Cc: Ivan Kelly; Rakesh R > Subject: Re: Race condition between LedgerChecker and Ensemble reformation > from client > > Hi Uma, It sounds like the replication worker shouldn't have written: > > 401 10.18.40.155:3181 10.18.40.155:3185 10.18.40.155:3184 > > If I'm not missing anything, the replication worker should update an existing > entry in the metadata, not create a new entry. > > -Flavio > > On Jun 26, 2012, at 6:07 PM, Uma Maheswara Rao G wrote: > >> Hi, >> >> It looks there is a race between LedgerChecker and Ensemble reformation from >> client. >> >> When one bookie failed from ensemble quoram, it will try to reform the >> ensemble on handleBookieFailure. >> >> At this time it is reforming the ensemble and resending the write request to >> new bookie (which is added into new ensemble.) >> >> At the same time if, If ReplicationWroker triggers on same ledger and run >> the LedgerChecker on it. >> LedgerChecker may find this last failed entry also as a fragment, because >> ensemble change already updated in metadata. >> >> If ReplicationWorker replicate this last fragment, then >> ChangeEnsembleCb#operationComplete will fail with Badversion, because >> ensemble data already updated by ReplicationWorker. >> >> >> LOG.error("Could not resolve ledger metadata conflict while changing >> ensemble to: " >> + newEnsemble + ", old >> meta data is \n" + new String(metadata.serialize()) >> + "\n, new meta data is >> \n" + new String(newMeta.serialize()) + "\n ,closing ledger"); >> >> 2012-06-23 10:51:47,814 - ERROR >> [main-EventThread:LedgerHandle$1ChangeEnsembleCb$1$1@714] - Could not >> resolve ledger metadata conflict while changing ensemble to: >> [/10.18.40.155:3182, /10.18.40.155:3185, /10.18.40.155:3184], old meta data >> is >> BookieMetadataFormatVersion 1 >> 2 >> 3 >> 0 >> 0 10.18.40.155:3181 10.18.40.155:3182 10.18.40.155:3183 >> 102 10.18.40.155:3181 10.18.40.155:3185 >> 10.18.40.155:3183 >> , new meta data is >> BookieMetadataFormatVersion 1 >> 2 >> 3 >> 0 >> 0 10.18.40.155:3181 10.18.40.155:3182 10.18.40.155:3183 >> 102 10.18.40.155:3181 10.18.40.155:3185 >> 10.18.40.155:3183 >> 401 10.18.40.155:3181 10.18.40.155:3185 >> 10.18.40.155:3184 >> ,closing ledger >> >> >> After this time, it will close the ledger. >> asyncCloseInternal(NoopCloseCallback.instance, null, rc); >> >> Then finally ledger metadata will looks like: >> >> 0 10.18.40.155:3181 10.18.40.155:3182 10.18.40.155:3183 >> 102 10.18.40.155:3181 10.18.40.155:3185 >> 10.18.40.155:3183 >> 401 10.18.40.155:3181 10.18.40.155:3185 >> 10.18.40.155:3184 >> 400 CLOSED >> >> Because client known last succussful entry is 400. Am i missing some thing >> here? >> >> >> >> >> >> Regards, >> >> Uma >> >> >> >>
