Adar Dembo created KUDU-2050:
--------------------------------
Summary: Avoid peer eviction during block manager startup
Key: KUDU-2050
URL: https://issues.apache.org/jira/browse/KUDU-2050
Project: Kudu
Issue Type: Bug
Components: fs, tserver
Affects Versions: 1.4.0
Reporter: Adar Dembo
Priority: Critical
In larger deployments we've observed that opening the block manager can take a
really long time, like tens of minutes or sometimes even hours. This is
especially true as of 1.4 where the log block manager tries to optimize on-disk
data structures during startup.
The default time to Raft peer eviction is 5 minutes. If one node is restarted
and LBM startup takes over 5 minutes, or if all nodes are restarted and there's
over 5 minutes of LBM startup time variance across them, the "slow" node could
have all of its replicas evicted. Besides generating a lot of unnecessary work
in rereplication, this effectively "defeats" the LBM optimizations in that it
would have been equally slow (but more efficient) to reformat the node instead.
So, let's reorder startup such that LBM startup counts towards replica
bootstrapping. One idea: adjust FsManager startup so that tablet-meta/cmeta
files can be accessed early to construct bootstrapping replicas, but to defer
opening of the block manager until after that time.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)