[
https://issues.apache.org/jira/browse/KUDU-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jean-Daniel Cryans updated KUDU-1358:
-------------------------------------
Target Version/s: 0.9.0 (was: 0.8.0)
Adar told me I can move it to 0.9.0.
> Following a master leader election, create table may fail
> ---------------------------------------------------------
>
> Key: KUDU-1358
> URL: https://issues.apache.org/jira/browse/KUDU-1358
> Project: Kudu
> Issue Type: Sub-task
> Components: master
> Affects Versions: 0.7.0
> Reporter: Adar Dembo
> Assignee: Adar Dembo
> Priority: Critical
>
> In the current multi-master design and implementation, tservers only
> heartbeat to the leader master. After a master leader election, there's a
> short window of time in which the new leader master may not be aware of the
> existence of some (or even all) of the tservers. Attempts to create a table
> during this window may fail, as the tservers known to the new leader master
> may be too few to satisfy the new table's replication factor. Whether the
> window exists in the first place depends on whether the new leader master had
> been leader before, and whether any of the tservers had sent heartbeats to it
> during that time.
> Some possible solutions include:
> # Modifying the heartbeat protocol so that tservers heartbeat to _all_
> masters, leaders and followers alike. Doing this will ensure that the "soft
> state" belonging to any master is always up-to-date at the cost of network
> bandwidth lost to heartbeating. Additionally, changes may need to be made to
> ensure that a follower master can't cause a tserver to take any actions.
> # Never actually failing a create table request due to too few tservers,
> instead allowing it to linger until such a time when more tservers exist. For
> this to actually be practical we'd need to allow clients to "cancel" a
> previously issued create table request.
> Both approaches probably include additional ramifications; this problem needs
> to be thought through carefully.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)