Jie Yu created MESOS-1165:
-----------------------------

             Summary: Retry required when recovering an empty log
                 Key: MESOS-1165
                 URL: https://issues.apache.org/jira/browse/MESOS-1165
             Project: Mesos
          Issue Type: Bug
            Reporter: Jie Yu
            Priority: Minor


Reported by [~benjaminhindman]. It's fairly non-intuitive that a 'fill' retry 
is required when recovering an empty log. Moreover, since retry is done via a 
'delay' it means that you can't pause the clock before calling 
Log::Writer::start! The following tests show the multiple calls and at one 
point I added comments to explain the very esoteric reasoning here. Here are 
the sequence of events:

First a replica is recovered with nothing but 0 is always a hole:

----
Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
----

At this point the replica assumes it was promised to a coordinator with 
proposal 0 (that's the default metadata). Then an implicit promise request is 
made with proposal 1.

----
Replica received implicit promise request with proposal 1 
----

Then the coordinator (via FillProcess) tries to fill the hole (position 0) 
explicitly:

----
Coordinator attemping to fill missing position 
----

And the replica receives the request:

----
Replica received explicit promise request for position 0 with proposal 1
----

But the filling must be retried because the 0th position is implicitly promised 
to proposer 1 (the same coordinator!) but the replica won't allow it (because 
it might not be safe) so the FillProcess now tries with proposal number 2 
(after the delay). While correct, this seems unfortunate (and not intuitive).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to