[ 
https://issues.apache.org/jira/browse/KUDU-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adar Dembo updated KUDU-1353:
-----------------------------
    Summary: Master does not rebuild replica information soft state on failover 
 (was: Alter table sometimes doesn't finish when using multiple masters)

I'm changing the title of the JIRA to be more generic, as there's another 
related issue here: because the replica soft state isn't rebuilt on master 
failover,  DeleteTable() requests will succeed but won't delete any tablets 
from tservers until those tablets are present in tablet reports (i.e. via 
tablet role change).

> Master does not rebuild replica information soft state on failover
> ------------------------------------------------------------------
>
>                 Key: KUDU-1353
>                 URL: https://issues.apache.org/jira/browse/KUDU-1353
>             Project: Kudu
>          Issue Type: Sub-task
>          Components: master
>    Affects Versions: 0.7.0
>            Reporter: Adar Dembo
>            Assignee: Adar Dembo
>            Priority: Critical
>         Attachments: first.txt, second.txt
>
>
> Under certain circumstances, a client's AlterTable request may get "stuck" in 
> a quorum of masters, causing the client to time out. The flaky test dashboard 
> usually shows a couple instances of this failure in 
> master_failover-itest::TestRenameTableSync.
> How does this happen? Let's assume the following:
> * Three masters
> * Table with one tablet, replicated three times.
> # It starts with a complicated master leader election in which master 1 is 
> elected for term 1, master 2 for term 2, then master 1 steps down.
> # This means the three tservers have had a chance to register with both 
> master 1 and master 2. 
> # Now, the tablet's leader replica is elected, causing its next tablet report 
> to include a "dirty" entry to master 2.
> # Master 2 is killed, master 1 is reelected for term 3.
> # The client issues AlterTable to master 1, but master 1 has no idea who the 
> tablet leader is, so the AlterTablet RPC it issues fails.
> # At this point, there are now outstanding master->leader AlterTablet RPCs.
> # The tablet leader begins heartbeating to master 1. Its report is 
> incremental and excludes registration information. Master 1 does not ask the 
> tserver to register, because it already got that information in step 1 during 
> the weird master leader election. The report includes the "tablet report" 
> section but lists no tablets, so the master doesn't ask it to send a full 
> report. No AlterTablet RPC is sent.
> # At this point, the alter table won't make forward progress until the tablet 
> leader's role changes and an AlterTablet RPC is sent, which may not happen 
> for a long time, or at all (in an integration test).
> Some possible solutions:
> # When a tserver decides to heartbeat to a new leader master, it should 
> always send a full tablet report.
> # When the master crafts an AlterTablet RPC and tries to find the leader, it 
> currently uses "soft state" to do so, state that's only updated in the event 
> of a tablet role change or full tablet report. Instead, we could use "hard 
> state" which, even though the master may be newly elected, should already 
> include up-to-date consensus configuration information.
> # Change tservers to always heartbeat to all masters. Doing so means that 
> following a master leader election, the new master has up-to-date "soft 
> state" and can find the leader when issuing an AlterTablet RPC.
> # Add a list of tservers to the master's "hard state", so that TSDescriptors 
> (needed when finding the leader tablet) can be instantiated without the need 
> for full tablet reports.
> At the moment I'm not sure which approach makes the most sense, or if there's 
> a better one out there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to