----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4768/#review6999 -----------------------------------------------------------
frameworks/mpi/README.txt <https://reviews.apache.org/r/4768/#comment15565> mpd was deprecated? What's the current alternative? frameworks/mpi/README.txt <https://reviews.apache.org/r/4768/#comment15566> We should probably support taking the path to these binaries an option passed automatically to the executor (e.g. through an environment variable option) to avoid PATH issues. frameworks/mpi/nmpiexec.py <https://reviews.apache.org/r/4768/#comment15555> Remove or comment this debugging. frameworks/mpi/nmpiexec.py <https://reviews.apache.org/r/4768/#comment15563> Can we avoid using the shell here (and having MPI_TASK be interpreted by the shell twice)? frameworks/mpi/nmpiexec.py <https://reviews.apache.org/r/4768/#comment15561> Remove trailing whitespace. frameworks/mpi/nmpiexec.py <https://reviews.apache.org/r/4768/#comment15557> Let's try a name that doesn't contain test or Python and will give a hint when multiple instances are running, like something using MPI_TASK. frameworks/mpi/startmpd.py <https://reviews.apache.org/r/4768/#comment15562> I think we can get rid of this entirely; it's clearly wrong in the case where multiple MPIs are running, and we should be tracking stray processes so we eventually kill them if MPD doesn't do something funny. (And if it does, we should figure out how to disable that.) frameworks/mpi/startmpd.py <https://reviews.apache.org/r/4768/#comment15559> Can we use MPD's exit status to determine when to send TASK_FAILED or TASK_KILLED? frameworks/mpi/startmpd.py <https://reviews.apache.org/r/4768/#comment15558> Use os.kill instead (and above). - Charles On 2012-04-18 04:27:25, Harvey Feng wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/4768/ > ----------------------------------------------------------- > > (Updated 2012-04-18 04:27:25) > > > Review request for mesos, Benjamin Hindman and Charles Reiss. > > > Summary > ------- > > Some updates to point out: > > -nmpiexec.py > -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved > 'driver.stop()' to statusUpdate() so that it stops when all tasks have been > finished, which occurs when the executor's launched mpd processes have all > exited. > -startmpd.py > -> Didn't remove cleanup(), and added code in shutdown() that manually > kills mpd processes. They might be useful during abnormal (cleanup) and > normal (shutdown) framework/executor termination...I think. cleanup() still > terminates all mpd's in the slave, but shutdown doesn't. > -> killtask() stops the mpd associated with the given tid. > -> Task states update nicely now. They correspond to the state of a task's > associated mpd process. > -Readme > -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec > on OS X and Ubuntu/Linux > > > This addresses bug MESOS-183. > https://issues.apache.org/jira/browse/MESOS-183 > > > Diffs > ----- > > frameworks/mpi/README.txt cdb4553 > frameworks/mpi/nmpiexec.py a5db9c0 > frameworks/mpi/startmpd.py 8eeba5e > > Diff: https://reviews.apache.org/r/4768/diff > > > Testing > ------- > > > Thanks, > > Harvey > >
