Hi Uma, It sounds like the replication worker shouldn't have written:
401 10.18.40.155:3181 10.18.40.155:3185 10.18.40.155:3184
If I'm not missing anything, the replication worker should update an existing
entry in the metadata, not create a new entry.
-Flavio
On Jun 26, 2012, at 6:07 PM, Uma Maheswara Rao G wrote:
> Hi,
>
> It looks there is a race between LedgerChecker and Ensemble reformation from
> client.
>
> When one bookie failed from ensemble quoram, it will try to reform the
> ensemble on handleBookieFailure.
>
> At this time it is reforming the ensemble and resending the write request to
> new bookie (which is added into new ensemble.)
>
> At the same time if, If ReplicationWroker triggers on same ledger and run the
> LedgerChecker on it.
> LedgerChecker may find this last failed entry also as a fragment, because
> ensemble change already updated in metadata.
>
> If ReplicationWorker replicate this last fragment, then
> ChangeEnsembleCb#operationComplete will fail with Badversion, because
> ensemble data already updated by ReplicationWorker.
>
>
> LOG.error("Could not resolve ledger metadata conflict while changing ensemble
> to: "
> + newEnsemble + ", old
> meta data is \n" + new String(metadata.serialize())
> + "\n, new meta data is
> \n" + new String(newMeta.serialize()) + "\n ,closing ledger");
>
> 2012-06-23 10:51:47,814 - ERROR
> [main-EventThread:LedgerHandle$1ChangeEnsembleCb$1$1@714] - Could not resolve
> ledger metadata conflict while changing ensemble to: [/10.18.40.155:3182,
> /10.18.40.155:3185, /10.18.40.155:3184], old meta data is
> BookieMetadataFormatVersion 1
> 2
> 3
> 0
> 0 10.18.40.155:3181 10.18.40.155:3182 10.18.40.155:3183
> 102 10.18.40.155:3181 10.18.40.155:3185 10.18.40.155:3183
> , new meta data is
> BookieMetadataFormatVersion 1
> 2
> 3
> 0
> 0 10.18.40.155:3181 10.18.40.155:3182 10.18.40.155:3183
> 102 10.18.40.155:3181 10.18.40.155:3185 10.18.40.155:3183
> 401 10.18.40.155:3181 10.18.40.155:3185 10.18.40.155:3184
> ,closing ledger
>
>
> After this time, it will close the ledger.
> asyncCloseInternal(NoopCloseCallback.instance, null, rc);
>
> Then finally ledger metadata will looks like:
>
> 0 10.18.40.155:3181 10.18.40.155:3182 10.18.40.155:3183
> 102 10.18.40.155:3181 10.18.40.155:3185 10.18.40.155:3183
> 401 10.18.40.155:3181 10.18.40.155:3185 10.18.40.155:3184
> 400 CLOSED
>
> Because client known last succussful entry is 400. Am i missing some thing
> here?
>
>
>
>
>
> Regards,
>
> Uma
>
>
>
>
>