> On 2012-05-25 18:12:45, Jessica wrote: > > frameworks/mpi/mpiexec-mesos.py, line 61 > > <https://reviews.apache.org/r/4768/diff/8/?file=109962#file109962line61> > > > > I've been puzzling over why the return is an issue with this revision > > since it wasn't with earlier revisions, and I believe it's due to the fact > > that the return is within the for loop. Before, this return was outside of > > the loop, so we'd always complete the loop. Once the loop completed, we'd > > check if we had enough mpds, and if so, we'd launch. With this revision, we > > may never get a chance to complete the loop and thus never check if we have > > enough resources. I think a break would solve the problem, provided it's > > acceptable not to respond to all of the offers. Otherwise, we need to make > > sure to decline all offers. > > Harvey Feng wrote: > You're right, I missed this :(. A continue would make sure we decline all > the offers if enough tasks are launched. > > Jessica wrote: > Yes; however, after further investigation, I've discovered that > completing the function results in threading.Thread(target=mpiexec).start() > getting called multiple times. So I guess it either needs to go back to how > it was before (with the return before the loop) or there needs to be some > kind of flag that indicates whether the thread has already been launched. (I > used the flag approach, and it worked fine, but maybe you have a better idea.)
Fixed by adding a flag. - Harvey ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4768/#review8116 ----------------------------------------------------------- On 2012-05-23 23:44:52, Harvey Feng wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/4768/ > ----------------------------------------------------------- > > (Updated 2012-05-23 23:44:52) > > > Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica. > > > Summary > ------- > > Some updates to point out: > > -nmpiexec.py > -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved > 'driver.stop()' to statusUpdate() so that it stops when all tasks have been > finished, which occurs when the executor's launched mpd processes have all > exited. > -startmpd.py > -> Didn't remove cleanup(), and added code in shutdown() that manually > kills mpd processes. They might be useful during abnormal (cleanup) and > normal (shutdown) framework/executor termination...I think. cleanup() still > terminates all mpd's in the slave, but shutdown doesn't. > -> killtask() stops the mpd associated with the given tid. > -> Task states update nicely now. They correspond to the state of a task's > associated mpd process. > -Readme > -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec > on OS X and Ubuntu/Linux > > > This addresses bug MESOS-183. > https://issues.apache.org/jira/browse/MESOS-183 > > > Diffs > ----- > > frameworks/mpi/startmpd.py 8eeba5e > frameworks/mpi/startmpd.sh 44faa05 > frameworks/mpi/nmpiexec 517bdbc > frameworks/mpi/nmpiexec.py a5db9c0 > frameworks/mpi/mpiexec-mesos PRE-CREATION > frameworks/mpi/mpiexec-mesos.py PRE-CREATION > frameworks/mpi/README.txt cdb4553 > > Diff: https://reviews.apache.org/r/4768/diff > > > Testing > ------- > > > Thanks, > > Harvey > >
