[
https://issues.apache.org/jira/browse/BOOKKEEPER-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ivan Kelly updated BOOKKEEPER-584:
----------------------------------
Fix Version/s: 4.2.2
> Data loss when ledger metadata is overwritten
> ---------------------------------------------
>
> Key: BOOKKEEPER-584
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-584
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-client
> Affects Versions: 4.2.0
> Reporter: Sijie Guo
> Assignee: Sijie Guo
> Priority: Critical
> Fix For: 4.3.0, 4.2.2
>
> Attachments: BOOKKEEPER-584.diff, BOOKKEEPER-584.diff,
> BOOKKEEPER-584.diff
>
>
> this is an issue introduced when fixing BOOKKEEPER-337. the original
> #resolveConflicts logic was removed by just checking state and current
> ensemble, which tends to fixing multiple bookies changed in same ensemble.
> the issue could be reproduce by a test case in following steps:
> 1. Ledger L writing several entries to ensemble A, B, C.
> 2. C succeed, B failed with slow responses and A failed with unrecoverable
> issue.
> 3. L would fail all the pending add ops and close the ledger with lastEntryId
> = -1. (since no add operations succeed).
> 4. The ownership of this Ledger is released and transferred to other machines
> (it is the normal use case for Hedwig).
> 5. the new owner tried to open Ledger L and recover the ensemble, suppose A,
> B is back to normal at this case. so L is closed with lastEntryId is not -1.
> 6. the old owner although closed the ledger, but doesn't blocking the
> responses for already failed pending add ops. so failures for B would kick in
> some ensemble changes and since the ledger metadata is already changed by new
> owner, so it needs to resolve the conflicts and update the ledger metadata
> with lastEntryId = -1 again. so we get different lastEntryId at different
> time, which cause inconsistency and data loss.
> for details of this sequence, a test case could describe it more clearly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira