[ 
https://issues.apache.org/jira/browse/KUDU-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196343#comment-15196343
 ] 

Adar Dembo commented on KUDU-1353:
----------------------------------

I've suggested another solution in the [multi-master design 
doc|http://gerrit.cloudera.org:8080/#/c/2527]: actually rebuild the per-tablet 
replica information from hard state when a new leader master is elected. It is 
this cache that is used in step 5 to figure out where to send the AlterTablet 
RPC. I think this is the best approach, because it acknowledges the fact that 
tablet replica information _is_ replicated, but not used on failover.

> Alter table sometimes doesn't finish when using multiple masters
> ----------------------------------------------------------------
>
>                 Key: KUDU-1353
>                 URL: https://issues.apache.org/jira/browse/KUDU-1353
>             Project: Kudu
>          Issue Type: Sub-task
>          Components: master
>    Affects Versions: 0.7.0
>            Reporter: Adar Dembo
>            Assignee: Adar Dembo
>            Priority: Critical
>         Attachments: first.txt, second.txt
>
>
> Under certain circumstances, a client's AlterTable request may get "stuck" in 
> a quorum of masters, causing the client to time out. The flaky test dashboard 
> usually shows a couple instances of this failure in 
> master_failover-itest::TestRenameTableSync.
> How does this happen? Let's assume the following:
> * Three masters
> * Table with one tablet, replicated three times.
> # It starts with a complicated master leader election in which master 1 is 
> elected for term 1, master 2 for term 2, then master 1 steps down.
> # This means the three tservers have had a chance to register with both 
> master 1 and master 2. 
> # Now, the tablet's leader replica is elected, causing its next tablet report 
> to include a "dirty" entry to master 2.
> # Master 2 is killed, master 1 is reelected for term 3.
> # The client issues AlterTable to master 1, but master 1 has no idea who the 
> tablet leader is, so the AlterTablet RPC it issues fails.
> # At this point, there are now outstanding master->leader AlterTablet RPCs.
> # The tablet leader begins heartbeating to master 1. Its report is 
> incremental and excludes registration information. Master 1 does not ask the 
> tserver to register, because it already got that information in step 1 during 
> the weird master leader election. The report includes the "tablet report" 
> section but lists no tablets, so the master doesn't ask it to send a full 
> report. No AlterTablet RPC is sent.
> # At this point, the alter table won't make forward progress until the tablet 
> leader's role changes and an AlterTablet RPC is sent, which may not happen 
> for a long time, or at all (in an integration test).
> Some possible solutions:
> # When a tserver decides to heartbeat to a new leader master, it should 
> always send a full tablet report.
> # When the master crafts an AlterTablet RPC and tries to find the leader, it 
> currently uses "soft state" to do so, state that's only updated in the event 
> of a tablet role change or full tablet report. Instead, we could use "hard 
> state" which, even though the master may be newly elected, should already 
> include up-to-date consensus configuration information.
> # Change tservers to always heartbeat to all masters. Doing so means that 
> following a master leader election, the new master has up-to-date "soft 
> state" and can find the leader when issuing an AlterTablet RPC.
> # Add a list of tservers to the master's "hard state", so that TSDescriptors 
> (needed when finding the leader tablet) can be instantiated without the need 
> for full tablet reports.
> At the moment I'm not sure which approach makes the most sense, or if there's 
> a better one out there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to