[
https://issues.apache.org/jira/browse/BOOKKEEPER-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13737087#comment-13737087
]
Sijie Guo commented on BOOKKEEPER-667:
--------------------------------------
{quote}
There's a possibility of a NoSuchElementException is #isConflictWith(), where
the ensembles are different sizes (when calling keyIter.next()).
{quote}
How this would happen? size is checked before iterating if you read the code.
{quote}
I'm wondering if it would be worth creating a EnsembleChange object, which is
basically a tuple of [EntryId, IndexOfBookieToReplace, NewBookie]. Then when we
get a MetadataConflict, we could reread the metadata, replace the current
metadata with the newly read metadata (after all, what's in zk is the true
configuration), and then reapply the EnsembleChange. I think this would be
cleaner and safer than the current set of heuristics we use in resolveMetadata.
Alternatively, we could leave out the NewBookie from the tuple, and choose a
new bookie each time it reruns (would avoid putting the same bookie in the
ensemble twice.
{quote}
I didn't think carefully about your proposal. but in general, as this bug is
marked for both 4.2.2 and 4.3.0, I would expect a simple bug fixing than
refactoring. if you want to refactor this part, then do it in a separated jira
only for 4.3.0.
> Client write will fail with BadMetadataVersion in case of multiple Bookie
> failures with AutoRecovery enabled
> ------------------------------------------------------------------------------------------------------------
>
> Key: BOOKKEEPER-667
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-667
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-auto-recovery
> Affects Versions: 4.2.1, 4.3.0
> Reporter: Vinay
> Assignee: Sijie Guo
> Priority: Blocker
> Fix For: 4.2.2, 4.3.0
>
> Attachments: BOOKKEEPER-667.diff, BOOKKEEPER-667.diff,
> BOOKKEEPER-667.patch, MetatadaConflictTest.patch
>
>
> Scenario:
> ------------
> 1. Start cluster of enough bookies, say 4, with autorecovery
> 2. Create ledger and write some entries.
> 3. Restart one of the bookies
> 4. again, write some more entries
> 5. wait for some time.. till autorecovery completes replication of first
> segment
> 6. Now restart one of the bookie of latest ensemble
> 7. continue to write.
> Here second ensemble change will fail, throwing BadMetadataVersion
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira