Hi,

There is a class BookKeeperTools that has methods for complete recovery of a 
node. The recovery of dead bookie involves updating zk first with the 
replacement bookie and then replicating the necessary ledger entries. So, if 
the 
recovery process / target bookie dies before the actual entries could get 
copied, then there can be data inconsistency issues.

Data copy can take time and thus increases the window during a which a node can 
potentially fail. Is this an issue that needs to be addressed?

Also, this tool needs to be triggered manually for doing node recovery. Any 
plans for automatic node recovery (similar to Hadoop HDFS) in which if a 
machine 
goes down, then some background process replicates data to maintain the 
replication factor (quorum).

-regards
Amit

Reply via email to