Yiming Zang created BOOKKEEPER-960:
--------------------------------------
Summary: Re-replicator: bookie checking should handle flapping
bookie registration
Key: BOOKKEEPER-960
URL: https://issues.apache.org/jira/browse/BOOKKEEPER-960
Project: Bookkeeper
Issue Type: Improvement
Components: bookkeeper-server
Reporter: Yiming Zang
Assignee: Yiming Zang
Problem:
currently re-replicator only uses the view of available/readonly bookies right
at the time doing bookie checking. it would accidentally treat a bookie
disappeared from zookeeper (e.g. zookeeper session expired, bookie restarted,
flapping bookie registration due to network/gc) as lost bookies, which
introduce unnecessary re-replication.
Solution:
introduce 'auditorStaleBookieInterval', if a bookie never register in the given
interval, it would be marked as 'stale' bookies and re-replicate all ledgers
belongs to that bookie. the default value is set 30 minutes.
Fixes:
- refactor bookie watcher to allow notifying bookie list thru BookiesListener
- introduce 'auditorStaleBookieInterval' to be able to mark bookies as 'stale'
if bookies aren't registered themselves to zookeeper
- add more info logging about critical steps on re-replication logic
- misc changes
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)