Jie Yu created MESOS-1247:
-----------------------------
Summary: Log writer who loses the election should remember the
highest proposal number seen
Key: MESOS-1247
URL: https://issues.apache.org/jira/browse/MESOS-1247
Project: Mesos
Issue Type: Bug
Affects Versions: 0.19.0
Reporter: Jie Yu
Fix For: 0.19.0
Say a log writer loses an election using proposal number 1 because some replica
has promised to a proposal number 2000.
The second time this log writer tries to get elected, it should use a proposal
number at least 2001. However, the current log writer implementation creates a
new coordinator on every retry, and does not pass along the highest proposal
number seen in the previous rounds. As a result, this log writer may not be
able to get elected even if it should.
There are a couple of solutions. We may want to re-use the coordinator for the
log writer (do not create a new coordinator every time) as BenH suggested in
the comments:
{noformat}
Future<Option<Log::Position> > LogWriterProcess::_start()
{
// We delete the existing coordinator (if exists) and create a new
// coordinator each time 'start' is called.
// TODO(benh): We shouldn't need to delete the coordinator everytime.
delete coordinator;
error = None();
CHECK_READY(recovering);
coordinator = new Coordinator(quorum, recovering.get(), network);
...
}
{noformat}
However, we needs to be careful to handle the discard semantics (when timeout
happens) so that a second elect call to the coordinator does not start until
the first one is properly cancelled.
--
This message was sent by Atlassian JIRA
(v6.2#6252)