[
https://issues.apache.org/jira/browse/KUDU-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jean-Daniel Cryans updated KUDU-1353:
-------------------------------------
Target Version/s: 0.9.0 (was: 0.8.0)
Adar told me I can move it to 0.9.0.
> Master does not rebuild replica information soft state on failover
> ------------------------------------------------------------------
>
> Key: KUDU-1353
> URL: https://issues.apache.org/jira/browse/KUDU-1353
> Project: Kudu
> Issue Type: Sub-task
> Components: master
> Affects Versions: 0.7.0
> Reporter: Adar Dembo
> Assignee: Adar Dembo
> Priority: Critical
> Attachments: first.txt, second.txt
>
>
> Under certain circumstances, a client's AlterTable request may get "stuck" in
> a quorum of masters, causing the client to time out. The flaky test dashboard
> usually shows a couple instances of this failure in
> master_failover-itest::TestRenameTableSync.
> How does this happen? Let's assume the following:
> * Three masters
> * Table with one tablet, replicated three times.
> # It starts with a complicated master leader election in which master 1 is
> elected for term 1, master 2 for term 2, then master 1 steps down.
> # This means the three tservers have had a chance to register with both
> master 1 and master 2.
> # Now, the tablet's leader replica is elected, causing its next tablet report
> to include a "dirty" entry to master 2.
> # Master 2 is killed, master 1 is reelected for term 3.
> # The client issues AlterTable to master 1, but master 1 has no idea who the
> tablet leader is, so the AlterTablet RPC it issues fails.
> # At this point, there are now outstanding master->leader AlterTablet RPCs.
> # The tablet leader begins heartbeating to master 1. Its report is
> incremental and excludes registration information. Master 1 does not ask the
> tserver to register, because it already got that information in step 1 during
> the weird master leader election. The report includes the "tablet report"
> section but lists no tablets, so the master doesn't ask it to send a full
> report. No AlterTablet RPC is sent.
> # At this point, the alter table won't make forward progress until the tablet
> leader's role changes and an AlterTablet RPC is sent, which may not happen
> for a long time, or at all (in an integration test).
> Some possible solutions:
> # When a tserver decides to heartbeat to a new leader master, it should
> always send a full tablet report.
> # When the master crafts an AlterTablet RPC and tries to find the leader, it
> currently uses "soft state" to do so, state that's only updated in the event
> of a tablet role change or full tablet report. Instead, we could use "hard
> state" which, even though the master may be newly elected, should already
> include up-to-date consensus configuration information.
> # Change tservers to always heartbeat to all masters. Doing so means that
> following a master leader election, the new master has up-to-date "soft
> state" and can find the leader when issuing an AlterTablet RPC.
> # Add a list of tservers to the master's "hard state", so that TSDescriptors
> (needed when finding the leader tablet) can be instantiated without the need
> for full tablet reports.
> At the moment I'm not sure which approach makes the most sense, or if there's
> a better one out there.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)