That's cool. I'm still wondering about the bookie recovery tool. Is there still 
a need for such a tool or the replication scheme will supersede it? What's your 
opinion?

-Flavio

On Jun 28, 2012, at 10:08 AM, Uma Maheswara Rao G wrote:

> Yes, Flavio there is no duplication of code.
> Fragment replication part I moved from BKAdmin to LedgerFragmentReplicator 
> class.See initial patch at BK-299.
> This is just like a helper class for BKadmin for replicating Fragment. 
> Directly I used this FragmentReplicator part for Replication Wroker.
> 
> Regards,
> Uma
> ________________________________________
> From: Flavio Junqueira [[email protected]]
> Sent: Thursday, June 28, 2012 3:10 AM
> To: [email protected]
> Cc: Ivan Kelly; Rakesh R
> Subject: Re: Race condition between  LedgerChecker and Ensemble reformation 
> from client
> 
> This discussion made me wonder about the relation between the bookie recovery 
> tool and the auto-recovery feature. Does the latter replace the former? Also, 
> if they share code, we want to avoid duplication, yes?
> 
> -Flavio
> 
> On Jun 27, 2012, at 4:17 PM, Uma Maheswara Rao G wrote:
> 
>> Thanks Ivan and Flavio.
>> 
>> I got the point.
>> 
>> 
>> 
>> And Yes, I have seen this with below mentioned steps. Infact, I used only 
>> very lower part of the code from BKAdmin.(Only fragment replication part) 
>> That will not have this prventing steps because that is only responsible for 
>> fragment.
>> 
>> Need to build in ReplicationWorker.
>> 
>> You mean that,
>> 
>>> 1. If the failed bookie is not in the last ensemble of the ledger,
>> recover as normal.
>> fine.
>> 
>> 2. If the failed bookie is in the last ensemble of the ledger, we
>> reopen the ledger using fencing. This stops the client from writing
>> any further entries to the ledger. Then recovery can continue as if
>> the ledger had already been closed.
>> This can make the NN to switch right?
>> 
>> I think we should have some delay for replication work to trigger. Otherwise 
>> every ensemble change may enable RW to fence the ledger right? Infact 
>> session timeout should help here. though there is an other case where delay 
>> will not help. Ledger already marked as UR bacause of some BK in previous 
>> enseble. That can trigger RW to scan ledger to find fragments. In my case we 
>> are keep shutting down the BKs and starting after some time.
>> 
>> 
>> 
>> Regards,
>> 
>> Uma
>> 
>> ________________________________________
>> From: Flavio Junqueira [[email protected]]
>> Sent: Wednesday, June 27, 2012 7:15 PM
>> To: [email protected]
>> Cc: Ivan Kelly; Rakesh R
>> Subject: Re: Race condition between  LedgerChecker and Ensemble reformation 
>> from client
>> 
>> Hi Uma, Check the whole paragraph: Consequently, we restrict the recovery 
>> tool to only perform changes to the metadata when the ledger is closed or 
>> when the ledger writer has detected the bookie crash, has replaced it, and 
>> reflected the change in the metadata.
>> 
>> It is not only for closed ledgers.
>> 
>> -Flavio
>> 
>> On Jun 27, 2012, at 1:40 PM, Uma Maheswara Rao G wrote:
>> 
>>> Thanks a lot, Flavio for reference.
>>> 
>>>   Here we are making use of RecoveryTool code.
>>> 
>>> Also I have seen in the doc saying:
>>> Consequently, we restrict the recovery tool to only perform changes to the 
>>> metadata when
>>> the ledger is closed
>>> 
>>> 
>>> 
>>> In BOOKKEEPER-112 , Client is trying to handle this metadat failure case. 
>>> But still there is a case it can not handle.
>>> 
>>> Here is the case :
>>> 
>>>     When one BK failed from ensemble it will try to update the ensemble 
>>> with new BK.
>>> 
>>> 
>>> 
>>> CLIENT  STEP 1: ex: 10  x y z  -->10  x a z
>>> 
>>> 
>>> 
>>> BETWEEN Step1 and Between Step2:
>>> 
>>> At this stage , If RT runs, it may thing that there is missed entry, 
>>> because a does not have the entry written yet. It may replace with new BK 
>>> again by copying that missed entry.
>>> 
>>> AutoRT updated ensemble ----> 10 x b z
>>> 
>>> 
>>> 
>>> 
>>> 
>>>  CLINET STEP2:  And start writing the failed entry to pending BKs, 
>>> unfortunately again it will try to update ensemble, but whatver ensemble 
>>> knows by client is '10 x a z'
>>> 
>>> 
>>> 
>>> Now metadata updation should fail as it got changed RT.
>>> 
>>> 
>>> 
>>> In this case resolve conflicts obiously can not be solved. will be closed as
>>> 
>>> 10  x b z
>>> 
>>> 9    CLOSED
>>> 
>>> 
>>> 
>>> Falvio, Ivan and  Sijie  What about your opinion on this case?
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Should be ok to skip OPENED ledgers? as standby will do rolling for every 2 
>>> mins. So, 2mins data may be in OPENED ledger.
>>> 
>>> Let's check for other scenarios as well.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Regards,
>>> 
>>> Uma
>>> 
>>> 
>>> 
>>> ________________________________________
>>> From: Flavio Junqueira [[email protected]]
>>> Sent: Wednesday, June 27, 2012 12:15 PM
>>> To: [email protected]
>>> Cc: Ivan Kelly; Rakesh R
>>> Subject: Re: Race condition between  LedgerChecker and Ensemble reformation 
>>> from client
>>> 
>>> Hi Uma, We have had a related issue in BOOKKEEPER-112 and there is a doc 
>>> there describing how we deal with it. It might help to give it a look.
>>> 
>>> -Flavio
>>> 
>>> On Jun 27, 2012, at 7:06 AM, Uma Maheswara Rao G wrote:
>>> 
>>>> Right. But Current Replication process considered for OPEN ledgers also. 
>>>> So, Ledger checker can not know whether that ensemble is just reformed by 
>>>> client or inprogress for write.
>>>> 
>>>> One way is to skip the replication for Inprogress Ledgers. But Auditor may 
>>>> need to recheck this opened ledgers periodically which ever it came across?
>>>> 
>>>> IMO, replicating inrprogress ledgers may create some inconsistencies.
>>>> 
>>>> Thanks,
>>>> Uma
>>>> ________________________________________
>>>> From: Flavio Junqueira [[email protected]]
>>>> Sent: Wednesday, June 27, 2012 4:21 AM
>>>> To: [email protected]
>>>> Cc: Ivan Kelly; Rakesh R
>>>> Subject: Re: Race condition between  LedgerChecker and Ensemble 
>>>> reformation from client
>>>> 
>>>> Hi Uma, It sounds like the replication worker shouldn't have written:
>>>> 
>>>> 401        10.18.40.155:3181        10.18.40.155:3185        
>>>> 10.18.40.155:3184
>>>> 
>>>> If I'm not missing anything, the replication worker should update an 
>>>> existing entry in the metadata, not create a new entry.
>>>> 
>>>> -Flavio
>>>> 
>>>> On Jun 26, 2012, at 6:07 PM, Uma Maheswara Rao G wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> It looks there is a race between LedgerChecker and Ensemble reformation 
>>>>> from client.
>>>>> 
>>>>> When one bookie failed from ensemble quoram, it will try to reform the 
>>>>> ensemble on handleBookieFailure.
>>>>> 
>>>>> At this time it is reforming the ensemble and resending the write request 
>>>>> to new bookie (which is added into new ensemble.)
>>>>> 
>>>>> At the same time if, If ReplicationWroker triggers on same ledger and run 
>>>>> the LedgerChecker on it.
>>>>> LedgerChecker may find this last failed entry also as a fragment, because 
>>>>> ensemble change already updated in metadata.
>>>>> 
>>>>> If ReplicationWorker replicate this last fragment, then  
>>>>> ChangeEnsembleCb#operationComplete will fail with Badversion, because 
>>>>> ensemble data already updated by ReplicationWorker.
>>>>> 
>>>>> 
>>>>> LOG.error("Could not resolve ledger metadata conflict while changing 
>>>>> ensemble to: "
>>>>>                                                  + newEnsemble + ", old 
>>>>> meta data is \n" + new String(metadata.serialize())
>>>>>                                                  + "\n, new meta data is 
>>>>> \n" + new String(newMeta.serialize()) + "\n ,closing ledger");
>>>>> 
>>>>> 2012-06-23 10:51:47,814 - ERROR 
>>>>> [main-EventThread:LedgerHandle$1ChangeEnsembleCb$1$1@714] - Could not 
>>>>> resolve ledger metadata conflict while changing ensemble to: 
>>>>> [/10.18.40.155:3182, /10.18.40.155:3185, /10.18.40.155:3184], old meta 
>>>>> data is
>>>>> BookieMetadataFormatVersion        1
>>>>> 2
>>>>> 3
>>>>> 0
>>>>> 0        10.18.40.155:3181        10.18.40.155:3182        
>>>>> 10.18.40.155:3183
>>>>> 102        10.18.40.155:3181        10.18.40.155:3185        
>>>>> 10.18.40.155:3183
>>>>> , new meta data is
>>>>> BookieMetadataFormatVersion        1
>>>>> 2
>>>>> 3
>>>>> 0
>>>>> 0        10.18.40.155:3181        10.18.40.155:3182        
>>>>> 10.18.40.155:3183
>>>>> 102        10.18.40.155:3181        10.18.40.155:3185        
>>>>> 10.18.40.155:3183
>>>>> 401        10.18.40.155:3181        10.18.40.155:3185        
>>>>> 10.18.40.155:3184
>>>>> ,closing ledger
>>>>> 
>>>>> 
>>>>> After this time, it will close the ledger. 
>>>>> asyncCloseInternal(NoopCloseCallback.instance, null, rc);
>>>>> 
>>>>> Then finally ledger metadata will looks like:
>>>>> 
>>>>> 0        10.18.40.155:3181        10.18.40.155:3182        
>>>>> 10.18.40.155:3183
>>>>> 102        10.18.40.155:3181        10.18.40.155:3185        
>>>>> 10.18.40.155:3183
>>>>> 401        10.18.40.155:3181        10.18.40.155:3185        
>>>>> 10.18.40.155:3184
>>>>> 400   CLOSED
>>>>> 
>>>>> Because client known last succussful entry is 400. Am i missing some 
>>>>> thing here?
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> Regards,
>>>>> 
>>>>> Uma
>>>>> 
>>>>> 
>>>>> 
>>>>> 

Reply via email to