Yiming Zang created BOOKKEEPER-960:
--------------------------------------

             Summary: Re-replicator: bookie checking should handle flapping 
bookie registration
                 Key: BOOKKEEPER-960
                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-960
             Project: Bookkeeper
          Issue Type: Improvement
          Components: bookkeeper-server
            Reporter: Yiming Zang
            Assignee: Yiming Zang


Problem:

currently re-replicator only uses the view of available/readonly bookies right 
at the time doing bookie checking. it would accidentally treat a bookie 
disappeared from zookeeper (e.g. zookeeper session expired, bookie restarted, 
flapping bookie registration due to network/gc) as lost bookies, which 
introduce unnecessary re-replication.

Solution:

introduce 'auditorStaleBookieInterval', if a bookie never register in the given 
interval, it would be marked as 'stale' bookies and re-replicate all ledgers 
belongs to that bookie. the default value is set 30 minutes.

Fixes:

- refactor bookie watcher to allow notifying bookie list thru BookiesListener
- introduce 'auditorStaleBookieInterval' to be able to mark bookies as 'stale' 
if bookies aren't registered themselves to zookeeper
- add more info logging about critical steps on re-replication logic
- misc changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to