[
https://issues.apache.org/jira/browse/MESOS-183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13258079#comment-13258079
]
[email protected] commented on MESOS-183:
-----------------------------------------------------
bq. On 2012-04-18 05:41:37, Charles Reiss wrote:
bq. > frameworks/mpi/README.txt, line 11
bq. > <https://reviews.apache.org/r/4768/diff/1/?file=102473#file102473line11>
bq. >
bq. > mpd was deprecated? What's the current alternative?
I think the new versions use the Hydra process manager, so 'mpiexec' would be
the only command needed to launch an MPI program.
bq. On 2012-04-18 05:41:37, Charles Reiss wrote:
bq. > frameworks/mpi/nmpiexec.py, line 22
bq. > <https://reviews.apache.org/r/4768/diff/1/?file=102474#file102474line22>
bq. >
bq. > Remove or comment this debugging.
done.
bq. On 2012-04-18 05:41:37, Charles Reiss wrote:
bq. > frameworks/mpi/startmpd.py, line 83
bq. > <https://reviews.apache.org/r/4768/diff/1/?file=102475#file102475line83>
bq. >
bq. > Use os.kill instead (and above).
done.
bq. On 2012-04-18 05:41:37, Charles Reiss wrote:
bq. > frameworks/mpi/startmpd.py, line 56
bq. > <https://reviews.apache.org/r/4768/diff/1/?file=102475#file102475line56>
bq. >
bq. > Can we use MPD's exit status to determine when to send TASK_FAILED
or TASK_KILLED?
ok, fixed that.
bq. On 2012-04-18 05:41:37, Charles Reiss wrote:
bq. > frameworks/mpi/startmpd.py, line 15
bq. > <https://reviews.apache.org/r/4768/diff/1/?file=102475#file102475line15>
bq. >
bq. > I think we can get rid of this entirely; it's clearly wrong in the
case where multiple MPIs are running, and we should be tracking stray processes
so we eventually kill them if MPD doesn't do something funny. (And if it does,
we should figure out how to disable that.)
ok - shutdown() should remove any stray processes left over.
bq. On 2012-04-18 05:41:37, Charles Reiss wrote:
bq. > frameworks/mpi/nmpiexec.py, line 210
bq. > <https://reviews.apache.org/r/4768/diff/1/?file=102474#file102474line210>
bq. >
bq. > Let's try a name that doesn't contain test or Python and will give a
hint when multiple instances are running, like something using MPI_TASK.
changed to 'MPI: ' + MPI_TASK, and added a --name option
bq. On 2012-04-18 05:41:37, Charles Reiss wrote:
bq. > frameworks/mpi/nmpiexec.py, line 95
bq. > <https://reviews.apache.org/r/4768/diff/1/?file=102474#file102474line95>
bq. >
bq. > Remove trailing whitespace.
done
bq. On 2012-04-18 05:41:37, Charles Reiss wrote:
bq. > frameworks/mpi/nmpiexec.py, line 31
bq. > <https://reviews.apache.org/r/4768/diff/1/?file=102474#file102474line31>
bq. >
bq. > Can we avoid using the shell here (and having MPI_TASK be
interpreted by the shell twice)?
ok
bq. On 2012-04-18 05:41:37, Charles Reiss wrote:
bq. > frameworks/mpi/README.txt, line 37
bq. > <https://reviews.apache.org/r/4768/diff/1/?file=102473#file102473line37>
bq. >
bq. > We should probably support taking the path to these binaries an
option passed automatically to the executor (e.g. through an environment
variable option) to avoid PATH issues.
ok. Passes the directory to mpi binaries using the executor's CommandInfo
- Harvey
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4768/#review6999
-----------------------------------------------------------
On 2012-04-20 08:17:57, Harvey Feng wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/4768/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-04-20 08:17:57)
bq.
bq.
bq. Review request for mesos, Benjamin Hindman and Charles Reiss.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Some updates to point out:
bq.
bq. -nmpiexec.py
bq. -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved
'driver.stop()' to statusUpdate() so that it stops when all tasks have been
finished, which occurs when the executor's launched mpd processes have all
exited.
bq. -startmpd.py
bq. -> Didn't remove cleanup(), and added code in shutdown() that manually
kills mpd processes. They might be useful during abnormal (cleanup) and normal
(shutdown) framework/executor termination...I think. cleanup() still terminates
all mpd's in the slave, but shutdown doesn't.
bq. -> killtask() stops the mpd associated with the given tid.
bq. -> Task states update nicely now. They correspond to the state of a
task's associated mpd process.
bq. -Readme
bq. -> Included additional info on how to setup and run MPICH2 1.2 and
nmpiexec on OS X and Ubuntu/Linux
bq.
bq.
bq. This addresses bug MESOS-183.
bq. https://issues.apache.org/jira/browse/MESOS-183
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. frameworks/mpi/README.txt cdb4553
bq. frameworks/mpi/nmpiexec.py a5db9c0
bq. frameworks/mpi/startmpd.py 8eeba5e
bq.
bq. Diff: https://reviews.apache.org/r/4768/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq.
bq. Thanks,
bq.
bq. Harvey
bq.
bq.
> Included MPI Framework Fails to Start
> -------------------------------------
>
> Key: MESOS-183
> URL: https://issues.apache.org/jira/browse/MESOS-183
> Project: Mesos
> Issue Type: Bug
> Components: documentation, framework
> Environment: Scientific Linux Cluster
> Reporter: Jessica J
> Assignee: Harvey Feng
> Priority: Blocker
> Labels: documentation, mpi, setup
>
> There are really two facets to this issue. The first is that no good
> documentation exists for setting up and using the included MPI framework. The
> second, and more important issue, is that the framework will not run. The
> second issue is possibly related to the first in that I may not be setting it
> up properly.
> To test the MPI framework, by trial and error I determined I needed to run
> python setup.py build and python setup.py install in the
> MESOS-HOME/src/python directory. Now when I try to run nmpiexec -h, I get an
> AttributeError, below:
> Traceback (most recent call last):
> File "./nmpiexec.py", line 2, in <module>
> import mesos
> File
> "/usr/lib64/python2.6/site-packages/mesos-0.9.0-py2.6-linux-x86_64.egg/mesos.py",
> line 22, in <module>
> import _mesos
> File
> "/usr/lib64/python2.6/site-packages/mesos-0.9.0-py2.6-linux-x86_64.egg/mesos_pb2.py",
> line 1286, in <module>
> DESCRIPTOR.message_types_by_name['FrameworkID'] = _FRAMEWORKID
> AttributeError: 'FileDescriptor' object has no attribute
> 'message_types_by_name'
> I've examined setup.py and determined that the version of protobuf it
> includes (2.4.1) does, indeed, contain a FileDescriptor class in
> descriptor.py that sets self.message_types_by_name, so I'm not sure what the
> issue is. Is this a bug? Or is there a step I'm missing? Do I need to also
> build/install protobuf?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira