Adar Dembo created KUDU-1374:
--------------------------------

             Summary: Operations triggered by TS heartbeats may go unperformed
                 Key: KUDU-1374
                 URL: https://issues.apache.org/jira/browse/KUDU-1374
             Project: Kudu
          Issue Type: Sub-task
          Components: master
    Affects Versions: 0.7.1
            Reporter: Adar Dembo
            Assignee: Adar Dembo
            Priority: Critical


(copying this from my multi-master design doc)

The inclusion or exclusion of a tablet in an incremental tablet report is 
edge-triggered, and may result in a state changing operation on the tserver, 
communicated via out-of-band RPC. This RPC is retried until it is successful. 
However, if the leader master dies *after* it is able to respond to the 
tserver's heartbeat but *before* the out-of-band RPC is sent, the 
edge-triggered tablet report may be missed, and the state changing operation 
will not be performed until the next time the tablet is included in a tablet 
report. As tablet report inclusion criteria is narrow, operations may be 
"missed" for quite some time.

These operations include:
# Some tablet deletions, such as tablets belonging to orphaned tables, or 
tablets whose deletion RPCs were sent and failed during an earlier 
*DeleteTable()* request.
# Some tablet alters, such as tablets whose alter RPCs were sent and failed 
during an earlier *AlterTable()* request.
# Config changes sent due to under-replicated tablets.

A simple fix is to require that tservers send a full tablet report when they 
detect that a new leader master was elected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to