Re: Review Request 28809: Started to maintain and checkpoint persisted resource in slave.

Jie Yu Thu, 29 Jan 2015 10:42:58 -0800


> On Jan. 8, 2015, 12:28 a.m., Ben Mahler wrote:
> > src/slave/slave.cpp, lines 1794-1815
> > <https://reviews.apache.org/r/28809/diff/2/?file=785913#file785913line1794>
> >
> >     What about adding a method in slave/state.hpp for checkpointing 
> > Resources?
> >     
> >     ```
> >     Try<Nothing> checkpoint(
> >         const std::string& path,
> >         const Resources& resources);
> >     ```


This is done in https://reviews.apache.org/r/29918/


> On Jan. 8, 2015, 12:28 a.m., Ben Mahler wrote:
> > src/slave/slave.cpp, lines 1817-1826
> > <https://reviews.apache.org/r/28809/diff/2/?file=785913#file785913line1817>
> >
> >     Maybe a little NOTE here that we assume that messages are ordered for 
> > the releasing to be correct? And that ordering is technically not 
> > guaranteed, maybe pointing to the relevant tickets?
> >     
> >     Just want to make sure we know what needs to be done to prevent this 
> > from ever biting us.

Added a TODO here explaining the details about the ordering issue.


> On Jan. 8, 2015, 12:28 a.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 1830
> > <https://reviews.apache.org/r/28809/diff/2/?file=785913#file785913line1830>
> >
> >     "Updated persistent resources to Y"?
> >     
> >     Or
> >     
> >     "Updated persistent resources from X to Y"?

Fixed.


> On Jan. 8, 2015, 12:28 a.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 3700
> > <https://reviews.apache.org/r/28809/diff/2/?file=785913#file785913line3700>
> >
> >     Shouldn't this CHECK exist in updateResources instead of here? Or in 
> > both places?
> >     
> >     Otherwise the slave will create a situation where it will fail CHECKs 
> > when it next recovers, is there something else I'm missing?

Doesn't apply any more. We want to get dynamic reservation done as well.


> On Jan. 8, 2015, 12:28 a.m., Ben Mahler wrote:
> > src/slave/slave.cpp, lines 1789-1791
> > <https://reviews.apache.org/r/28809/diff/2/?file=785913#file785913line1789>
> >
> >     Just so I understand, we won't be deleting anything, right? We'll leave 
> > volumes "dangling" only when the master fails over during the issue you 
> > described here. In that case, the master also thinks that there are no 
> > persisted resources on the slave.
> >     
> >     And we won't delete the unknown volumes, we'll leave them dangling on 
> > the filesystem, right?. Can you please file a ticket to capture this issue 
> > and link it here and in the epic? Don't want to leave it unfixed.
> >     
> >     If the master is up, it should re-send the persistent resources at 
> > which point the slave gets them back during re-registration, right?. During 
> > registration however, the problem still exists, right?
> >     
> >     Let's document the details here and/or in a ticket!

This TODO has been resolved by using an all-or-nothing atomic checkpoint 
function.


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28809/#review67131
-----------------------------------------------------------


On Dec. 8, 2014, 10:19 p.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28809/
> -----------------------------------------------------------
> 
> (Updated Dec. 8, 2014, 10:19 p.m.)
> 
> 
> Review request for mesos and Ben Mahler.
> 
> 
> Bugs: MESOS-2031
>     https://issues.apache.org/jira/browse/MESOS-2031
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Started to maintain and checkpoint persisted resource in slave. That includes:
> 1) responds to update resources message
> 2) checkpoint resources
> 3) recover checkpointed resources
> 4) send checkpointed resources during register/reregister
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 70bd8c1fde4ea09fa54c76aa93424a1adb0309f6 
>   src/slave/slave.cpp 9ac64589c353b2f17f538db7de01faa55b2369b9 
> 
> Diff: https://reviews.apache.org/r/28809/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>

Re: Review Request 28809: Started to maintain and checkpoint persisted resource in slave.

Reply via email to