[ 
https://issues.apache.org/jira/browse/KUDU-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated KUDU-1337:
-------------------------------------
       Assignee: Mike Percy
    Code Review: http://gerrit.cloudera.org:8080/#/c/2436/

Mike posted a patch for this, so assigning it to him :)

> DeleteTablet can cause spurious unfruitful remote bootstraps
> ------------------------------------------------------------
>
>                 Key: KUDU-1337
>                 URL: https://issues.apache.org/jira/browse/KUDU-1337
>             Project: Kudu
>          Issue Type: Task
>          Components: recovery, tserver
>    Affects Versions: 0.7.0
>            Reporter: Adar Dembo
>            Assignee: Mike Percy
>
> While triaging a cascading YCSB failure, we noticed the following sequence of 
> events:
> # Client deleted a table.
> # Master serviced the request.
> # Master issued DeleteTablet for a particular tablet to a quorum of 3 peers.
> # Due to load or whatever, the followers received and processed the 
> DeleteTablet before the leader.
> # The leader noticed the the followers no longer had the tablet, and told 
> them to remote bootstrap it from itself.
> # The leader began servicing the DeleteTablet.
> # The followers began remote bootstrapping, which killed the leader due to 
> KUDU-1328. If the leader hadn't died, the followers' remote bootstrap 
> sessions would have failed.
> # There's an open question for this step: is any bad "state" left in the 
> followers? Or do the remote bootstrap sessions abort cleanly?
> Anyway, the fact that the replicas handled the DeleteTablet before the leader 
> led to unnecessary remote bootstrap work. We should avoid this.
> Note: Todd suspects that delete_table-test's flakiness may be due to this 
> behavior. I didn't look into it, but whomever tackles this should consider 
> that possibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to