> On 2012-05-25 18:12:45, Jessica wrote: > > frameworks/mpi/mpiexec-mesos.py, line 61 > > <https://reviews.apache.org/r/4768/diff/8/?file=109962#file109962line61> > > > > I've been puzzling over why the return is an issue with this revision > > since it wasn't with earlier revisions, and I believe it's due to the fact > > that the return is within the for loop. Before, this return was outside of > > the loop, so we'd always complete the loop. Once the loop completed, we'd > > check if we had enough mpds, and if so, we'd launch. With this revision, we > > may never get a chance to complete the loop and thus never check if we have > > enough resources. I think a break would solve the problem, provided it's > > acceptable not to respond to all of the offers. Otherwise, we need to make > > sure to decline all offers. > > Harvey Feng wrote: > You're right, I missed this :(. A continue would make sure we decline all > the offers if enough tasks are launched.
Yes; however, after further investigation, I've discovered that completing the function results in threading.Thread(target=mpiexec).start() getting called multiple times. So I guess it either needs to go back to how it was before (with the return before the loop) or there needs to be some kind of flag that indicates whether the thread has already been launched. (I used the flag approach, and it worked fine, but maybe you have a better idea.) - Jessica ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4768/#review8116 ----------------------------------------------------------- On 2012-05-23 23:44:52, Harvey Feng wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/4768/ > ----------------------------------------------------------- > > (Updated 2012-05-23 23:44:52) > > > Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica. > > > Summary > ------- > > Some updates to point out: > > -nmpiexec.py > -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved > 'driver.stop()' to statusUpdate() so that it stops when all tasks have been > finished, which occurs when the executor's launched mpd processes have all > exited. > -startmpd.py > -> Didn't remove cleanup(), and added code in shutdown() that manually > kills mpd processes. They might be useful during abnormal (cleanup) and > normal (shutdown) framework/executor termination...I think. cleanup() still > terminates all mpd's in the slave, but shutdown doesn't. > -> killtask() stops the mpd associated with the given tid. > -> Task states update nicely now. They correspond to the state of a task's > associated mpd process. > -Readme > -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec > on OS X and Ubuntu/Linux > > > This addresses bug MESOS-183. > https://issues.apache.org/jira/browse/MESOS-183 > > > Diffs > ----- > > frameworks/mpi/startmpd.py 8eeba5e > frameworks/mpi/startmpd.sh 44faa05 > frameworks/mpi/nmpiexec 517bdbc > frameworks/mpi/nmpiexec.py a5db9c0 > frameworks/mpi/mpiexec-mesos PRE-CREATION > frameworks/mpi/mpiexec-mesos.py PRE-CREATION > frameworks/mpi/README.txt cdb4553 > > Diff: https://reviews.apache.org/r/4768/diff > > > Testing > ------- > > > Thanks, > > Harvey > >
