> On Jan. 8, 2015, 12:28 a.m., Ben Mahler wrote: > > src/slave/slave.cpp, lines 1794-1815 > > <https://reviews.apache.org/r/28809/diff/2/?file=785913#file785913line1794> > > > > What about adding a method in slave/state.hpp for checkpointing > > Resources? > > > > ``` > > Try<Nothing> checkpoint( > > const std::string& path, > > const Resources& resources); > > ```
This is done in https://reviews.apache.org/r/29918/ > On Jan. 8, 2015, 12:28 a.m., Ben Mahler wrote: > > src/slave/slave.cpp, lines 1817-1826 > > <https://reviews.apache.org/r/28809/diff/2/?file=785913#file785913line1817> > > > > Maybe a little NOTE here that we assume that messages are ordered for > > the releasing to be correct? And that ordering is technically not > > guaranteed, maybe pointing to the relevant tickets? > > > > Just want to make sure we know what needs to be done to prevent this > > from ever biting us. Added a TODO here explaining the details about the ordering issue. > On Jan. 8, 2015, 12:28 a.m., Ben Mahler wrote: > > src/slave/slave.cpp, line 1830 > > <https://reviews.apache.org/r/28809/diff/2/?file=785913#file785913line1830> > > > > "Updated persistent resources to Y"? > > > > Or > > > > "Updated persistent resources from X to Y"? Fixed. > On Jan. 8, 2015, 12:28 a.m., Ben Mahler wrote: > > src/slave/slave.cpp, line 3700 > > <https://reviews.apache.org/r/28809/diff/2/?file=785913#file785913line3700> > > > > Shouldn't this CHECK exist in updateResources instead of here? Or in > > both places? > > > > Otherwise the slave will create a situation where it will fail CHECKs > > when it next recovers, is there something else I'm missing? Doesn't apply any more. We want to get dynamic reservation done as well. > On Jan. 8, 2015, 12:28 a.m., Ben Mahler wrote: > > src/slave/slave.cpp, lines 1789-1791 > > <https://reviews.apache.org/r/28809/diff/2/?file=785913#file785913line1789> > > > > Just so I understand, we won't be deleting anything, right? We'll leave > > volumes "dangling" only when the master fails over during the issue you > > described here. In that case, the master also thinks that there are no > > persisted resources on the slave. > > > > And we won't delete the unknown volumes, we'll leave them dangling on > > the filesystem, right?. Can you please file a ticket to capture this issue > > and link it here and in the epic? Don't want to leave it unfixed. > > > > If the master is up, it should re-send the persistent resources at > > which point the slave gets them back during re-registration, right?. During > > registration however, the problem still exists, right? > > > > Let's document the details here and/or in a ticket! This TODO has been resolved by using an all-or-nothing atomic checkpoint function. - Jie ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28809/#review67131 ----------------------------------------------------------- On Dec. 8, 2014, 10:19 p.m., Jie Yu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/28809/ > ----------------------------------------------------------- > > (Updated Dec. 8, 2014, 10:19 p.m.) > > > Review request for mesos and Ben Mahler. > > > Bugs: MESOS-2031 > https://issues.apache.org/jira/browse/MESOS-2031 > > > Repository: mesos-git > > > Description > ------- > > Started to maintain and checkpoint persisted resource in slave. That includes: > 1) responds to update resources message > 2) checkpoint resources > 3) recover checkpointed resources > 4) send checkpointed resources during register/reregister > > > Diffs > ----- > > src/slave/slave.hpp 70bd8c1fde4ea09fa54c76aa93424a1adb0309f6 > src/slave/slave.cpp 9ac64589c353b2f17f538db7de01faa55b2369b9 > > Diff: https://reviews.apache.org/r/28809/diff/ > > > Testing > ------- > > make check > > > Thanks, > > Jie Yu > >
