[
https://issues.apache.org/jira/browse/BOOKKEEPER-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242095#comment-13242095
]
Rakesh R commented on BOOKKEEPER-126:
-------------------------------------
So if I understand the conclusion correctly, we have discussed and identified
two cases to be implemented as part of this jira:
# *When ledger flushing failed with IOException?*
+Soln+ r-o mode:
>> On IOE bookie (say, multi ledger dirs -> /tmp/bk1-data, /tmp/bk2-data
etc) should see next ledger dirs for writing and mark the tried dirs as
BAD_FOR_WRITE. Finally, if there is no success, then switch to r-o mode.
>> Also, if journal failed with IOE, immediately switch to r-o mode.
Shall I open a subtask for the impl?
# *Ledger entries got corrupted due to disk failures or bad sectors?*
+Soln+ scanner approach:
IMHO, The following are the sequence of the healing procedure:
* a) Perform scan and prepare entries owning:
>> On startup bookie would contact ZK for the ledger metadata and on every
write it would update the ledger metadata map.
>> Special datastructure <ledgerDirId, <entryId, replica bookies>> needs to
designed for the same contains ledgerId, entries owning, ledger dirs etc. ?
* b) Read the entries and identify missing entries if any?
Yeah, the DistributionScheduling is happening in the client side and batch
reading is also good.
I am thinking that the ledgers are local to the server and how about read
them directly instead of using PerChannelBookieClient?.
* c) Initiate re-replication:
Corrupted bookie first identify the peer bookie which has the copy and send
notification to this for re-replication. Here, it could use ZK watchers for
sending the notification, for this each bookie should listen to a specfic
persistent znode say 'underreplicaEntries'. The corrupted bookie should update
the data <ledgerId, missingEntryIds> to 'underreplicaEntries' of the
corresponding bookie which has the copy. On notification, the peer bookie
should use the same logic of DistributionScheduling algo which presents in the
client side.
Is it legal, server depending on client?, otw server could randomly select a
re-replica bookie and update the ZK ledger metadata?
How the ZK ledger metadata ('nextReplicaIndexToReadFrom') looks like after
re-replication?
For example:
Say, entries 0-100 ledger metadata mapping is
0 (A, B, C)
50(B, C, D)
End Ledger:100
Assume, entries 30 to 39 got corrupted in B and say rereplicated to E. Is
it like?
0(A, B, C)
30(E, B, C)
40(B, C, D)
50(B, C, D)
If you agree with the above approaches, probably do a detailed write-up.
@Sijie
another tough thing is we need to tell closed ledger from opened/in-recovery
ledger, when handling last ensemble of opened/in-recovery ledger.
I am missing something, Could you give more details on this?
> EntryLogger doesn't detect when one of it's logfiles is corrupt
> ---------------------------------------------------------------
>
> Key: BOOKKEEPER-126
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-126
> Project: Bookkeeper
> Issue Type: Bug
> Reporter: Ivan Kelly
> Priority: Blocker
> Fix For: 4.1.0
>
>
> If an entry log is corrupt, the bookie will ignore any entries past the
> corruption. Quorum writes stops this being a problem at the moment, but we
> should detect corruptions like this and rereplicate if necessary.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira