[
https://issues.apache.org/jira/browse/KUDU-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jean-Daniel Cryans updated KUDU-1337:
-------------------------------------
Assignee: Mike Percy
Code Review: http://gerrit.cloudera.org:8080/#/c/2436/
Mike posted a patch for this, so assigning it to him :)
> DeleteTablet can cause spurious unfruitful remote bootstraps
> ------------------------------------------------------------
>
> Key: KUDU-1337
> URL: https://issues.apache.org/jira/browse/KUDU-1337
> Project: Kudu
> Issue Type: Task
> Components: recovery, tserver
> Affects Versions: 0.7.0
> Reporter: Adar Dembo
> Assignee: Mike Percy
>
> While triaging a cascading YCSB failure, we noticed the following sequence of
> events:
> # Client deleted a table.
> # Master serviced the request.
> # Master issued DeleteTablet for a particular tablet to a quorum of 3 peers.
> # Due to load or whatever, the followers received and processed the
> DeleteTablet before the leader.
> # The leader noticed the the followers no longer had the tablet, and told
> them to remote bootstrap it from itself.
> # The leader began servicing the DeleteTablet.
> # The followers began remote bootstrapping, which killed the leader due to
> KUDU-1328. If the leader hadn't died, the followers' remote bootstrap
> sessions would have failed.
> # There's an open question for this step: is any bad "state" left in the
> followers? Or do the remote bootstrap sessions abort cleanly?
> Anyway, the fact that the replicas handled the DeleteTablet before the leader
> led to unnecessary remote bootstrap work. We should avoid this.
> Note: Todd suspects that delete_table-test's flakiness may be due to this
> behavior. I didn't look into it, but whomever tackles this should consider
> that possibility.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)