> On 2012-04-24 21:45:19, Benjamin Hindman wrote: > > frameworks/mpi/nmpiexec.py, line 209 > > <https://reviews.apache.org/r/4768/diff/3/?file=103693#file103693line209> > > > > I'm not really sure how this can be used: the user running this script > > will not know what machines they might run on, so they can't possibly know > > which IP addresses they want to use on those machines. Maybe Jessica J. had > > something else in mind here? > > > > It definitely makes sense to keep --ifhn for the master.
Hmmm... Looks like my comment here disappeared somehow. Anyway, I agree that the --ifhn-slave option doesn't make sense since there's no way you can specify an IP address for each slave. I guess what I had in mind was a more general Mesos configuration option rather than specific to the MPI framework. >From a selfish standpoint, I'm not terribly concerned since the master was the >option I was concerned about. However, I've been thinking that, assuming >you're using the deploy scripts to start your cluster, it may be worth >considering modifying the format of the slaves configuration file (which >currently lists only hostnames) and allowing the user to also specify an IP >address for each host. Then perhaps the MPI framework could grab the IP >address from the Mesos configuration. This would be useful for deploying Mesos >as well since some users (such as myself) may have their Mesos config files in >an NTFS directory. (This setup means I can't start the entire cluster at one >go if I need to give any of my nodes a specific IP address since all nodes >will try to use the same ip option in mesos.conf.) Just a thought... I'll open >a general Mesos "Improvement" ticket if there's any chance of it happening. > On 2012-04-24 21:45:19, Benjamin Hindman wrote: > > frameworks/mpi/nmpiexec.py, line 223 > > <https://reviews.apache.org/r/4768/diff/3/?file=103693#file103693line223> > > > > It looks like you assume that path ends in a '/'. You should probably > > check this here. Why not use os.path.join? - Jessica ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4768/#review7179 ----------------------------------------------------------- On 2012-05-02 13:29:50, Harvey Feng wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/4768/ > ----------------------------------------------------------- > > (Updated 2012-05-02 13:29:50) > > > Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica. > > > Summary > ------- > > Some updates to point out: > > -nmpiexec.py > -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved > 'driver.stop()' to statusUpdate() so that it stops when all tasks have been > finished, which occurs when the executor's launched mpd processes have all > exited. > -startmpd.py > -> Didn't remove cleanup(), and added code in shutdown() that manually > kills mpd processes. They might be useful during abnormal (cleanup) and > normal (shutdown) framework/executor termination...I think. cleanup() still > terminates all mpd's in the slave, but shutdown doesn't. > -> killtask() stops the mpd associated with the given tid. > -> Task states update nicely now. They correspond to the state of a task's > associated mpd process. > -Readme > -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec > on OS X and Ubuntu/Linux > > > This addresses bug MESOS-183. > https://issues.apache.org/jira/browse/MESOS-183 > > > Diffs > ----- > > frameworks/mpi/README.txt cdb4553 > frameworks/mpi/nmpiexec 517bdbc > frameworks/mpi/nmpiexec.py a5db9c0 > frameworks/mpi/startmpd.py 8eeba5e > frameworks/mpi/startmpd.sh 44faa05 > > Diff: https://reviews.apache.org/r/4768/diff > > > Testing > ------- > > > Thanks, > > Harvey > >
