Thanks a lot, Flavio for reference.
Here we are making use of RecoveryTool code.
Also I have seen in the doc saying:
Consequently, we restrict the recovery tool to only perform changes to the
metadata when
the ledger is closed
In BOOKKEEPER-112 , Client is trying to handle this metadat failure case. But
still there is a case it can not handle.
Here is the case :
When one BK failed from ensemble it will try to update the ensemble with
new BK.
CLIENT STEP 1: ex: 10 x y z -->10 x a z
BETWEEN Step1 and Between Step2:
At this stage , If RT runs, it may thing that there is missed entry, because
a does not have the entry written yet. It may replace with new BK again by
copying that missed entry.
AutoRT updated ensemble ----> 10 x b z
CLINET STEP2: And start writing the failed entry to pending BKs,
unfortunately again it will try to update ensemble, but whatver ensemble knows
by client is '10 x a z'
Now metadata updation should fail as it got changed RT.
In this case resolve conflicts obiously can not be solved. will be closed as
10 x b z
9 CLOSED
Falvio, Ivan and Sijie What about your opinion on this case?
Should be ok to skip OPENED ledgers? as standby will do rolling for every 2
mins. So, 2mins data may be in OPENED ledger.
Let's check for other scenarios as well.
Regards,
Uma
________________________________________
From: Flavio Junqueira [[email protected]]
Sent: Wednesday, June 27, 2012 12:15 PM
To: [email protected]
Cc: Ivan Kelly; Rakesh R
Subject: Re: Race condition between LedgerChecker and Ensemble reformation
from client
Hi Uma, We have had a related issue in BOOKKEEPER-112 and there is a doc there
describing how we deal with it. It might help to give it a look.
-Flavio
On Jun 27, 2012, at 7:06 AM, Uma Maheswara Rao G wrote:
> Right. But Current Replication process considered for OPEN ledgers also. So,
> Ledger checker can not know whether that ensemble is just reformed by client
> or inprogress for write.
>
> One way is to skip the replication for Inprogress Ledgers. But Auditor may
> need to recheck this opened ledgers periodically which ever it came across?
>
> IMO, replicating inrprogress ledgers may create some inconsistencies.
>
> Thanks,
> Uma
> ________________________________________
> From: Flavio Junqueira [[email protected]]
> Sent: Wednesday, June 27, 2012 4:21 AM
> To: [email protected]
> Cc: Ivan Kelly; Rakesh R
> Subject: Re: Race condition between LedgerChecker and Ensemble reformation
> from client
>
> Hi Uma, It sounds like the replication worker shouldn't have written:
>
> 401 10.18.40.155:3181 10.18.40.155:3185 10.18.40.155:3184
>
> If I'm not missing anything, the replication worker should update an existing
> entry in the metadata, not create a new entry.
>
> -Flavio
>
> On Jun 26, 2012, at 6:07 PM, Uma Maheswara Rao G wrote:
>
>> Hi,
>>
>> It looks there is a race between LedgerChecker and Ensemble reformation from
>> client.
>>
>> When one bookie failed from ensemble quoram, it will try to reform the
>> ensemble on handleBookieFailure.
>>
>> At this time it is reforming the ensemble and resending the write request to
>> new bookie (which is added into new ensemble.)
>>
>> At the same time if, If ReplicationWroker triggers on same ledger and run
>> the LedgerChecker on it.
>> LedgerChecker may find this last failed entry also as a fragment, because
>> ensemble change already updated in metadata.
>>
>> If ReplicationWorker replicate this last fragment, then
>> ChangeEnsembleCb#operationComplete will fail with Badversion, because
>> ensemble data already updated by ReplicationWorker.
>>
>>
>> LOG.error("Could not resolve ledger metadata conflict while changing
>> ensemble to: "
>> + newEnsemble + ", old
>> meta data is \n" + new String(metadata.serialize())
>> + "\n, new meta data is
>> \n" + new String(newMeta.serialize()) + "\n ,closing ledger");
>>
>> 2012-06-23 10:51:47,814 - ERROR
>> [main-EventThread:LedgerHandle$1ChangeEnsembleCb$1$1@714] - Could not
>> resolve ledger metadata conflict while changing ensemble to:
>> [/10.18.40.155:3182, /10.18.40.155:3185, /10.18.40.155:3184], old meta data
>> is
>> BookieMetadataFormatVersion 1
>> 2
>> 3
>> 0
>> 0 10.18.40.155:3181 10.18.40.155:3182 10.18.40.155:3183
>> 102 10.18.40.155:3181 10.18.40.155:3185
>> 10.18.40.155:3183
>> , new meta data is
>> BookieMetadataFormatVersion 1
>> 2
>> 3
>> 0
>> 0 10.18.40.155:3181 10.18.40.155:3182 10.18.40.155:3183
>> 102 10.18.40.155:3181 10.18.40.155:3185
>> 10.18.40.155:3183
>> 401 10.18.40.155:3181 10.18.40.155:3185
>> 10.18.40.155:3184
>> ,closing ledger
>>
>>
>> After this time, it will close the ledger.
>> asyncCloseInternal(NoopCloseCallback.instance, null, rc);
>>
>> Then finally ledger metadata will looks like:
>>
>> 0 10.18.40.155:3181 10.18.40.155:3182 10.18.40.155:3183
>> 102 10.18.40.155:3181 10.18.40.155:3185
>> 10.18.40.155:3183
>> 401 10.18.40.155:3181 10.18.40.155:3185
>> 10.18.40.155:3184
>> 400 CLOSED
>>
>> Because client known last succussful entry is 400. Am i missing some thing
>> here?
>>
>>
>>
>>
>>
>> Regards,
>>
>> Uma
>>
>>
>>
>>