> On 2012-04-24 21:45:19, Benjamin Hindman wrote: > > frameworks/mpi/nmpiexec.py, line 270 > > <https://reviews.apache.org/r/4768/diff/3/?file=103693#file103693line270> > > > > Did someone actually ask for this feature?
No, but I thought it wouldn't hurt... > On 2012-04-24 21:45:19, Benjamin Hindman wrote: > > frameworks/mpi/nmpiexec.py, line 253 > > <https://reviews.apache.org/r/4768/diff/3/?file=103693#file103693line253> > > > > I'd like to make this whole thing simpler. In particular, I don't see > > any need for an executor here (i.e., the startmpd.py script). We actually > > have a 1-1 mapping from tasks to mpd's, so let's just have our TaskInfo's > > have a CommandInfo which just launches the mpd. Something like this for > > that CommandInfo's value: > > > > ...command.value = "${MPICH2PATH}mpd -n --host=${MPD_MASTER_IP} > > --port=${MPD_MASTER_PORT}" > > > > (Note your code, as does what I have above, assumes that MPICH2PATH > > includes the trailing '/'. Also, you might need to change the string based > > on if you're passing --ifhn. Finally, note I used the long options --host > > and --port. This is for readability for other people that might not know > > mpd very well. Likewise, if there is a long option for -n it would be great > > to use that instead.) > > > > Of course, this will require setting the command's environment > > variables appropriately. But, this should let us completely eliminate the > > startmpd.py script!!!!! Yeah! Less things to maintain! Ok. Switched to no-executor. I directly specified the variables during string/command construction, so there isn't a need to set environment variables. Got rid of startmpd.py/startmpd.h =) > On 2012-04-24 21:45:19, Benjamin Hindman wrote: > > frameworks/mpi/nmpiexec.py, line 199 > > <https://reviews.apache.org/r/4768/diff/3/?file=103693#file103693line199> > > > > s/slot/mpd > > > > Same as above, is there something we can/should set when we launch the > > mpd? There doesn't seem to be anything to set for memory when launching the mpd... > On 2012-04-24 21:45:19, Benjamin Hindman wrote: > > frameworks/mpi/nmpiexec.py, line 28 > > <https://reviews.apache.org/r/4768/diff/3/?file=103693#file103693line28> > > > > I know this wasn't the style of the codebase you've inherited, but I'd > > like spaces around all operators please. For example, this line should read: > > > > print "Got " + str(TOTAL_TASKS) + " mpd slots, running mpiexec" > > > > It looks like you've already done this with some of the code you've > > added (which is awesome!), but please clean up all the code. Thanks! Done. I converted most string operations to use %, roughly following the Google style guide for Python. > On 2012-04-24 21:45:19, Benjamin Hindman wrote: > > frameworks/mpi/README.txt, line 76 > > <https://reviews.apache.org/r/4768/diff/3/?file=103692#file103692line76> > > > > Does nmpiexec work out of the box without changes (not included in this > > review). > > > > Also, let's change the names from "nmpiexec*" to "mpiexec-mesos*"! ;) > > > > Finally, just pass host:port, no need for the 'master@' prefix. Yup, nmpiexec works. > On 2012-04-24 21:45:19, Benjamin Hindman wrote: > > frameworks/mpi/nmpiexec.py, line 196 > > <https://reviews.apache.org/r/4768/diff/3/?file=103693#file103693line196> > > > > s/slot/mpd > > > > Also, do we want to set the '--ncpus' option on the actual mpd that we > > launch on a Mesos slave? Added --npus to slave mpd calls. > On 2012-04-24 21:45:19, Benjamin Hindman wrote: > > frameworks/mpi/nmpiexec.py, lines 91-92 > > <https://reviews.apache.org/r/4768/diff/3/?file=103693#file103693line91> > > > > Use driver.declineOffer please. Done for this and others below. Issue#188 (small bug fix for Python/C++ binding) would have to be committed for driver.declineOffer to work though. - Harvey ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4768/#review7179 ----------------------------------------------------------- On 2012-05-02 13:00:12, Harvey Feng wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/4768/ > ----------------------------------------------------------- > > (Updated 2012-05-02 13:00:12) > > > Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica. > > > Summary > ------- > > Some updates to point out: > > -nmpiexec.py > -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved > 'driver.stop()' to statusUpdate() so that it stops when all tasks have been > finished, which occurs when the executor's launched mpd processes have all > exited. > -startmpd.py > -> Didn't remove cleanup(), and added code in shutdown() that manually > kills mpd processes. They might be useful during abnormal (cleanup) and > normal (shutdown) framework/executor termination...I think. cleanup() still > terminates all mpd's in the slave, but shutdown doesn't. > -> killtask() stops the mpd associated with the given tid. > -> Task states update nicely now. They correspond to the state of a task's > associated mpd process. > -Readme > -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec > on OS X and Ubuntu/Linux > > > This addresses bug MESOS-183. > https://issues.apache.org/jira/browse/MESOS-183 > > > Diffs > ----- > > frameworks/mpi/nmpiexec.py a5db9c0 > frameworks/mpi/README.txt cdb4553 > frameworks/mpi/startmpd.py 8eeba5e > frameworks/mpi/startmpd.sh 44faa05 > > Diff: https://reviews.apache.org/r/4768/diff > > > Testing > ------- > > > Thanks, > > Harvey > >
