Hi Uma, It sounds like the replication worker shouldn't have written:

401        10.18.40.155:3181        10.18.40.155:3185        10.18.40.155:3184

If I'm not missing anything, the replication worker should update an existing 
entry in the metadata, not create a new entry.

-Flavio

On Jun 26, 2012, at 6:07 PM, Uma Maheswara Rao G wrote:

> Hi,
> 
> It looks there is a race between LedgerChecker and Ensemble reformation from 
> client.
> 
> When one bookie failed from ensemble quoram, it will try to reform the 
> ensemble on handleBookieFailure.
> 
> At this time it is reforming the ensemble and resending the write request to 
> new bookie (which is added into new ensemble.)
> 
> At the same time if, If ReplicationWroker triggers on same ledger and run the 
> LedgerChecker on it.
> LedgerChecker may find this last failed entry also as a fragment, because 
> ensemble change already updated in metadata.
> 
> If ReplicationWorker replicate this last fragment, then  
> ChangeEnsembleCb#operationComplete will fail with Badversion, because 
> ensemble data already updated by ReplicationWorker.
> 
> 
> LOG.error("Could not resolve ledger metadata conflict while changing ensemble 
> to: "
>                                                      + newEnsemble + ", old 
> meta data is \n" + new String(metadata.serialize())
>                                                      + "\n, new meta data is 
> \n" + new String(newMeta.serialize()) + "\n ,closing ledger");
> 
> 2012-06-23 10:51:47,814 - ERROR 
> [main-EventThread:LedgerHandle$1ChangeEnsembleCb$1$1@714] - Could not resolve 
> ledger metadata conflict while changing ensemble to: [/10.18.40.155:3182, 
> /10.18.40.155:3185, /10.18.40.155:3184], old meta data is
> BookieMetadataFormatVersion        1
> 2
> 3
> 0
> 0        10.18.40.155:3181        10.18.40.155:3182        10.18.40.155:3183
> 102        10.18.40.155:3181        10.18.40.155:3185        10.18.40.155:3183
> , new meta data is
> BookieMetadataFormatVersion        1
> 2
> 3
> 0
> 0        10.18.40.155:3181        10.18.40.155:3182        10.18.40.155:3183
> 102        10.18.40.155:3181        10.18.40.155:3185        10.18.40.155:3183
> 401        10.18.40.155:3181        10.18.40.155:3185        10.18.40.155:3184
> ,closing ledger
> 
> 
> After this time, it will close the ledger. 
> asyncCloseInternal(NoopCloseCallback.instance, null, rc);
> 
> Then finally ledger metadata will looks like:
> 
> 0        10.18.40.155:3181        10.18.40.155:3182        10.18.40.155:3183
> 102        10.18.40.155:3181        10.18.40.155:3185        10.18.40.155:3183
> 401        10.18.40.155:3181        10.18.40.155:3185        10.18.40.155:3184
> 400   CLOSED
> 
> Because client known last succussful entry is 400. Am i missing some thing 
> here?
> 
> 
> 
> 
> 
> Regards,
> 
> Uma
> 
> 
> 
> 
> 

Reply via email to