Adar Dembo created KUDU-1374:
--------------------------------
Summary: Operations triggered by TS heartbeats may go unperformed
Key: KUDU-1374
URL: https://issues.apache.org/jira/browse/KUDU-1374
Project: Kudu
Issue Type: Sub-task
Components: master
Affects Versions: 0.7.1
Reporter: Adar Dembo
Assignee: Adar Dembo
Priority: Critical
(copying this from my multi-master design doc)
The inclusion or exclusion of a tablet in an incremental tablet report is
edge-triggered, and may result in a state changing operation on the tserver,
communicated via out-of-band RPC. This RPC is retried until it is successful.
However, if the leader master dies *after* it is able to respond to the
tserver's heartbeat but *before* the out-of-band RPC is sent, the
edge-triggered tablet report may be missed, and the state changing operation
will not be performed until the next time the tablet is included in a tablet
report. As tablet report inclusion criteria is narrow, operations may be
"missed" for quite some time.
These operations include:
# Some tablet deletions, such as tablets belonging to orphaned tables, or
tablets whose deletion RPCs were sent and failed during an earlier
*DeleteTable()* request.
# Some tablet alters, such as tablets whose alter RPCs were sent and failed
during an earlier *AlterTable()* request.
# Config changes sent due to under-replicated tablets.
A simple fix is to require that tservers send a full tablet report when they
detect that a new leader master was elected.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)