Jie Yu created MESOS-1247:
-----------------------------

             Summary: Log writer who loses the election should remember the 
highest proposal number seen
                 Key: MESOS-1247
                 URL: https://issues.apache.org/jira/browse/MESOS-1247
             Project: Mesos
          Issue Type: Bug
    Affects Versions: 0.19.0
            Reporter: Jie Yu
             Fix For: 0.19.0


Say a log writer loses an election using proposal number 1 because some replica 
has promised to a proposal number 2000.

The second time this log writer tries to get elected, it should use a proposal 
number at least 2001. However, the current log writer implementation creates a 
new coordinator on every retry, and does not pass along the highest proposal 
number seen in the previous rounds. As a result, this log writer may not be 
able to get elected even if it should.

There are a couple of solutions. We may want to re-use the coordinator for the 
log writer (do not create a new coordinator every time) as BenH suggested in 
the comments:
{noformat}
Future<Option<Log::Position> > LogWriterProcess::_start()
{
  // We delete the existing coordinator (if exists) and create a new
  // coordinator each time 'start' is called.
  // TODO(benh): We shouldn't need to delete the coordinator everytime.
  delete coordinator;
  error = None();

  CHECK_READY(recovering);

  coordinator = new Coordinator(quorum, recovering.get(), network);
  ...
}
{noformat}

However, we needs to be careful to handle the discard semantics (when timeout 
happens) so that a second elect call to the coordinator does not start until 
the first one is properly cancelled.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to