> On 2012-04-18 05:41:37, Charles Reiss wrote: > > frameworks/mpi/README.txt, line 11 > > <https://reviews.apache.org/r/4768/diff/1/?file=102473#file102473line11> > > > > mpd was deprecated? What's the current alternative?
I think the new versions use the Hydra process manager, so 'mpiexec' would be the only command needed to launch an MPI program. > On 2012-04-18 05:41:37, Charles Reiss wrote: > > frameworks/mpi/nmpiexec.py, line 22 > > <https://reviews.apache.org/r/4768/diff/1/?file=102474#file102474line22> > > > > Remove or comment this debugging. done. > On 2012-04-18 05:41:37, Charles Reiss wrote: > > frameworks/mpi/startmpd.py, line 83 > > <https://reviews.apache.org/r/4768/diff/1/?file=102475#file102475line83> > > > > Use os.kill instead (and above). done. > On 2012-04-18 05:41:37, Charles Reiss wrote: > > frameworks/mpi/startmpd.py, line 56 > > <https://reviews.apache.org/r/4768/diff/1/?file=102475#file102475line56> > > > > Can we use MPD's exit status to determine when to send TASK_FAILED or > > TASK_KILLED? ok, fixed that. > On 2012-04-18 05:41:37, Charles Reiss wrote: > > frameworks/mpi/startmpd.py, line 15 > > <https://reviews.apache.org/r/4768/diff/1/?file=102475#file102475line15> > > > > I think we can get rid of this entirely; it's clearly wrong in the case > > where multiple MPIs are running, and we should be tracking stray processes > > so we eventually kill them if MPD doesn't do something funny. (And if it > > does, we should figure out how to disable that.) ok - shutdown() should remove any stray processes left over. > On 2012-04-18 05:41:37, Charles Reiss wrote: > > frameworks/mpi/nmpiexec.py, line 210 > > <https://reviews.apache.org/r/4768/diff/1/?file=102474#file102474line210> > > > > Let's try a name that doesn't contain test or Python and will give a > > hint when multiple instances are running, like something using MPI_TASK. changed to 'MPI: ' + MPI_TASK, and added a --name option > On 2012-04-18 05:41:37, Charles Reiss wrote: > > frameworks/mpi/nmpiexec.py, line 95 > > <https://reviews.apache.org/r/4768/diff/1/?file=102474#file102474line95> > > > > Remove trailing whitespace. done > On 2012-04-18 05:41:37, Charles Reiss wrote: > > frameworks/mpi/nmpiexec.py, line 31 > > <https://reviews.apache.org/r/4768/diff/1/?file=102474#file102474line31> > > > > Can we avoid using the shell here (and having MPI_TASK be interpreted > > by the shell twice)? ok > On 2012-04-18 05:41:37, Charles Reiss wrote: > > frameworks/mpi/README.txt, line 37 > > <https://reviews.apache.org/r/4768/diff/1/?file=102473#file102473line37> > > > > We should probably support taking the path to these binaries an option > > passed automatically to the executor (e.g. through an environment variable > > option) to avoid PATH issues. ok. Passes the directory to mpi binaries using the executor's CommandInfo - Harvey ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4768/#review6999 ----------------------------------------------------------- On 2012-04-20 08:17:57, Harvey Feng wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/4768/ > ----------------------------------------------------------- > > (Updated 2012-04-20 08:17:57) > > > Review request for mesos, Benjamin Hindman and Charles Reiss. > > > Summary > ------- > > Some updates to point out: > > -nmpiexec.py > -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved > 'driver.stop()' to statusUpdate() so that it stops when all tasks have been > finished, which occurs when the executor's launched mpd processes have all > exited. > -startmpd.py > -> Didn't remove cleanup(), and added code in shutdown() that manually > kills mpd processes. They might be useful during abnormal (cleanup) and > normal (shutdown) framework/executor termination...I think. cleanup() still > terminates all mpd's in the slave, but shutdown doesn't. > -> killtask() stops the mpd associated with the given tid. > -> Task states update nicely now. They correspond to the state of a task's > associated mpd process. > -Readme > -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec > on OS X and Ubuntu/Linux > > > This addresses bug MESOS-183. > https://issues.apache.org/jira/browse/MESOS-183 > > > Diffs > ----- > > frameworks/mpi/README.txt cdb4553 > frameworks/mpi/nmpiexec.py a5db9c0 > frameworks/mpi/startmpd.py 8eeba5e > > Diff: https://reviews.apache.org/r/4768/diff > > > Testing > ------- > > > Thanks, > > Harvey > >
