----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28809/#review67131 -----------------------------------------------------------
src/slave/slave.cpp <https://reviews.apache.org/r/28809/#comment110970> Just so I understand, we won't be deleting anything, right? We'll leave volumes "dangling" only when the master fails over during the issue you described here. In that case, the master also thinks that there are no persisted resources on the slave. And we won't delete the unknown volumes, we'll leave them dangling on the filesystem, right?. Can you please file a ticket to capture this issue and link it here and in the epic? Don't want to leave it unfixed. If the master is up, it should re-send the persistent resources at which point the slave gets them back during re-registration, right?. During registration however, the problem still exists, right? Let's document the details here and/or in a ticket! src/slave/slave.cpp <https://reviews.apache.org/r/28809/#comment110972> What about adding a method in slave/state.hpp for checkpointing Resources? ``` Try<Nothing> checkpoint( const std::string& path, const Resources& resources); ``` src/slave/slave.cpp <https://reviews.apache.org/r/28809/#comment110973> Maybe a little NOTE here that we assume that messages are ordered for the releasing to be correct? And that ordering is technically not guaranteed, maybe pointing to the relevant tickets? Just want to make sure we know what needs to be done to prevent this from ever biting us. src/slave/slave.cpp <https://reviews.apache.org/r/28809/#comment110974> "Updated persistent resources to Y"? Or "Updated persistent resources from X to Y"? src/slave/slave.cpp <https://reviews.apache.org/r/28809/#comment110977> Shouldn't this CHECK exist in updateResources instead of here? Or in both places? Otherwise the slave will create a situation where it will fail CHECKs when it next recovers, is there something else I'm missing? - Ben Mahler On Dec. 8, 2014, 10:19 p.m., Jie Yu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/28809/ > ----------------------------------------------------------- > > (Updated Dec. 8, 2014, 10:19 p.m.) > > > Review request for mesos and Ben Mahler. > > > Bugs: MESOS-2031 > https://issues.apache.org/jira/browse/MESOS-2031 > > > Repository: mesos-git > > > Description > ------- > > Started to maintain and checkpoint persisted resource in slave. That includes: > 1) responds to update resources message > 2) checkpoint resources > 3) recover checkpointed resources > 4) send checkpointed resources during register/reregister > > > Diffs > ----- > > src/slave/slave.hpp 70bd8c1fde4ea09fa54c76aa93424a1adb0309f6 > src/slave/slave.cpp 9ac64589c353b2f17f538db7de01faa55b2369b9 > > Diff: https://reviews.apache.org/r/28809/diff/ > > > Testing > ------- > > make check > > > Thanks, > > Jie Yu > >
