> On Dec. 22, 2014, 5:43 a.m., Adam B wrote: > > src/slave/slave.cpp, line 1787 > > <https://reviews.apache.org/r/28809/diff/2/?file=785913#file785913line1787> > > > > What is the source of truth for persistent resources? The checkpoint or > > the master? What if a framework is trying to launch a task on a slave with > > a new persistent disk resource, while at the same time, the slave is > > restarted with new persistent disks added by the slave operator? Assume we > > can update slave resources without invalidating the SlaveID, and that the > > master has already started processing the launchTask when the slave tries > > to reregister. > > I'm guessing the slave would reject the UpdateResources call until the > > slave has successfully re-registered with the master, so the master would > > have the updated persistentResource, which it would then update with the > > newly launched task's persistent disk.
Good question! The source of truth is in master in the current design. There are several resaons for this choice. But I guess the main motivation is that master is in the central place of the cluster (in the middle of framework and slaves). Keeping source of truch in master makes it easy for us to maintain a consistent state for the cluster. Since master can fail over, slave needs to checkpoint persistent volumes and dynamic reservations and report them back to master when master fails over. Also, a newly registered slave needs to tell master about it's persistent volumes and dynamic reservations as well since master has no idea about a new slave. Now, If an operator wants to add a new persistent volume, he/she has to start the slave as a new slave (new SlaveID). This is the similar to changing SlaveInfo right now. Once we have a way to update SlaveInfo (MESOS-1739), we could use the similar mechanism to update persistent volumes and dynamic reservations. - Jie ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28809/#review65755 ----------------------------------------------------------- On Dec. 8, 2014, 10:19 p.m., Jie Yu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/28809/ > ----------------------------------------------------------- > > (Updated Dec. 8, 2014, 10:19 p.m.) > > > Review request for mesos and Ben Mahler. > > > Bugs: MESOS-2031 > https://issues.apache.org/jira/browse/MESOS-2031 > > > Repository: mesos-git > > > Description > ------- > > Started to maintain and checkpoint persisted resource in slave. That includes: > 1) responds to update resources message > 2) checkpoint resources > 3) recover checkpointed resources > 4) send checkpointed resources during register/reregister > > > Diffs > ----- > > src/slave/slave.hpp 70bd8c1fde4ea09fa54c76aa93424a1adb0309f6 > src/slave/slave.cpp 9ac64589c353b2f17f538db7de01faa55b2369b9 > > Diff: https://reviews.apache.org/r/28809/diff/ > > > Testing > ------- > > make check > > > Thanks, > > Jie Yu > >
