Thanks a lot, Flavio for reference.

     Here we are making use of RecoveryTool code.

Also I have seen in the doc saying:
  Consequently, we restrict the recovery tool to only perform changes to the 
metadata when
the ledger is closed



  In BOOKKEEPER-112 , Client is trying to handle this metadat failure case. But 
still there is a case it can not handle.

  Here is the case :

       When one BK failed from ensemble it will try to update the ensemble with 
new BK.



CLIENT  STEP 1: ex: 10  x y z  -->10  x a z



  BETWEEN Step1 and Between Step2:

   At this stage , If RT runs, it may thing that there is missed entry, because 
a does not have the entry written yet. It may replace with new BK again by 
copying that missed entry.

   AutoRT updated ensemble ----> 10 x b z





    CLINET STEP2:  And start writing the failed entry to pending BKs, 
unfortunately again it will try to update ensemble, but whatver ensemble knows 
by client is '10 x a z'



Now metadata updation should fail as it got changed RT.



In this case resolve conflicts obiously can not be solved. will be closed as

10  x b z

9    CLOSED



Falvio, Ivan and  Sijie  What about your opinion on this case?





Should be ok to skip OPENED ledgers? as standby will do rolling for every 2 
mins. So, 2mins data may be in OPENED ledger.

Let's check for other scenarios as well.





Regards,

Uma



________________________________________
From: Flavio Junqueira [[email protected]]
Sent: Wednesday, June 27, 2012 12:15 PM
To: [email protected]
Cc: Ivan Kelly; Rakesh R
Subject: Re: Race condition between  LedgerChecker and Ensemble reformation 
from client

Hi Uma, We have had a related issue in BOOKKEEPER-112 and there is a doc there 
describing how we deal with it. It might help to give it a look.

-Flavio

On Jun 27, 2012, at 7:06 AM, Uma Maheswara Rao G wrote:

> Right. But Current Replication process considered for OPEN ledgers also. So, 
> Ledger checker can not know whether that ensemble is just reformed by client 
> or inprogress for write.
>
> One way is to skip the replication for Inprogress Ledgers. But Auditor may 
> need to recheck this opened ledgers periodically which ever it came across?
>
> IMO, replicating inrprogress ledgers may create some inconsistencies.
>
> Thanks,
> Uma
> ________________________________________
> From: Flavio Junqueira [[email protected]]
> Sent: Wednesday, June 27, 2012 4:21 AM
> To: [email protected]
> Cc: Ivan Kelly; Rakesh R
> Subject: Re: Race condition between  LedgerChecker and Ensemble reformation 
> from client
>
> Hi Uma, It sounds like the replication worker shouldn't have written:
>
> 401        10.18.40.155:3181        10.18.40.155:3185        10.18.40.155:3184
>
> If I'm not missing anything, the replication worker should update an existing 
> entry in the metadata, not create a new entry.
>
> -Flavio
>
> On Jun 26, 2012, at 6:07 PM, Uma Maheswara Rao G wrote:
>
>> Hi,
>>
>> It looks there is a race between LedgerChecker and Ensemble reformation from 
>> client.
>>
>> When one bookie failed from ensemble quoram, it will try to reform the 
>> ensemble on handleBookieFailure.
>>
>> At this time it is reforming the ensemble and resending the write request to 
>> new bookie (which is added into new ensemble.)
>>
>> At the same time if, If ReplicationWroker triggers on same ledger and run 
>> the LedgerChecker on it.
>> LedgerChecker may find this last failed entry also as a fragment, because 
>> ensemble change already updated in metadata.
>>
>> If ReplicationWorker replicate this last fragment, then  
>> ChangeEnsembleCb#operationComplete will fail with Badversion, because 
>> ensemble data already updated by ReplicationWorker.
>>
>>
>> LOG.error("Could not resolve ledger metadata conflict while changing 
>> ensemble to: "
>>                                                     + newEnsemble + ", old 
>> meta data is \n" + new String(metadata.serialize())
>>                                                     + "\n, new meta data is 
>> \n" + new String(newMeta.serialize()) + "\n ,closing ledger");
>>
>> 2012-06-23 10:51:47,814 - ERROR 
>> [main-EventThread:LedgerHandle$1ChangeEnsembleCb$1$1@714] - Could not 
>> resolve ledger metadata conflict while changing ensemble to: 
>> [/10.18.40.155:3182, /10.18.40.155:3185, /10.18.40.155:3184], old meta data 
>> is
>> BookieMetadataFormatVersion        1
>> 2
>> 3
>> 0
>> 0        10.18.40.155:3181        10.18.40.155:3182        10.18.40.155:3183
>> 102        10.18.40.155:3181        10.18.40.155:3185        
>> 10.18.40.155:3183
>> , new meta data is
>> BookieMetadataFormatVersion        1
>> 2
>> 3
>> 0
>> 0        10.18.40.155:3181        10.18.40.155:3182        10.18.40.155:3183
>> 102        10.18.40.155:3181        10.18.40.155:3185        
>> 10.18.40.155:3183
>> 401        10.18.40.155:3181        10.18.40.155:3185        
>> 10.18.40.155:3184
>> ,closing ledger
>>
>>
>> After this time, it will close the ledger. 
>> asyncCloseInternal(NoopCloseCallback.instance, null, rc);
>>
>> Then finally ledger metadata will looks like:
>>
>> 0        10.18.40.155:3181        10.18.40.155:3182        10.18.40.155:3183
>> 102        10.18.40.155:3181        10.18.40.155:3185        
>> 10.18.40.155:3183
>> 401        10.18.40.155:3181        10.18.40.155:3185        
>> 10.18.40.155:3184
>> 400   CLOSED
>>
>> Because client known last succussful entry is 400. Am i missing some thing 
>> here?
>>
>>
>>
>>
>>
>> Regards,
>>
>> Uma
>>
>>
>>
>>

Reply via email to