[
https://issues.apache.org/jira/browse/MESOS-183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13266571#comment-13266571
]
[email protected] commented on MESOS-183:
-----------------------------------------------------
bq. On 2012-04-24 21:45:19, Benjamin Hindman wrote:
bq. > frameworks/mpi/nmpiexec.py, line 209
bq. > <https://reviews.apache.org/r/4768/diff/3/?file=103693#file103693line209>
bq. >
bq. > I'm not really sure how this can be used: the user running this
script will not know what machines they might run on, so they can't possibly
know which IP addresses they want to use on those machines. Maybe Jessica J.
had something else in mind here?
bq. >
bq. > It definitely makes sense to keep --ifhn for the master.
Hmmm... Looks like my comment here disappeared somehow. Anyway, I agree that
the --ifhn-slave option doesn't make sense since there's no way you can specify
an IP address for each slave. I guess what I had in mind was a more general
Mesos configuration option rather than specific to the MPI framework.
bq. From a selfish standpoint, I'm not terribly concerned since the master was
the option I was concerned about. However, I've been thinking that, assuming
you're using the deploy scripts to start your cluster, it may be worth
considering modifying the format of the slaves configuration file (which
currently lists only hostnames) and allowing the user to also specify an IP
address for each host. Then perhaps the MPI framework could grab the IP address
from the Mesos configuration. This would be useful for deploying Mesos as well
since some users (such as myself) may have their Mesos config files in an NTFS
directory. (This setup means I can't start the entire cluster at one go if I
need to give any of my nodes a specific IP address since all nodes will try to
use the same ip option in mesos.conf.) Just a thought... I'll open a general
Mesos "Improvement" ticket if there's any chance of it happening.
bq. On 2012-04-24 21:45:19, Benjamin Hindman wrote:
bq. > frameworks/mpi/nmpiexec.py, line 223
bq. > <https://reviews.apache.org/r/4768/diff/3/?file=103693#file103693line223>
bq. >
bq. > It looks like you assume that path ends in a '/'. You should
probably check this here.
Why not use os.path.join?
- Jessica
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4768/#review7179
-----------------------------------------------------------
On 2012-05-02 13:29:50, Harvey Feng wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/4768/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-05-02 13:29:50)
bq.
bq.
bq. Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Some updates to point out:
bq.
bq. -nmpiexec.py
bq. -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved
'driver.stop()' to statusUpdate() so that it stops when all tasks have been
finished, which occurs when the executor's launched mpd processes have all
exited.
bq. -startmpd.py
bq. -> Didn't remove cleanup(), and added code in shutdown() that manually
kills mpd processes. They might be useful during abnormal (cleanup) and normal
(shutdown) framework/executor termination...I think. cleanup() still terminates
all mpd's in the slave, but shutdown doesn't.
bq. -> killtask() stops the mpd associated with the given tid.
bq. -> Task states update nicely now. They correspond to the state of a
task's associated mpd process.
bq. -Readme
bq. -> Included additional info on how to setup and run MPICH2 1.2 and
nmpiexec on OS X and Ubuntu/Linux
bq.
bq.
bq. This addresses bug MESOS-183.
bq. https://issues.apache.org/jira/browse/MESOS-183
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. frameworks/mpi/README.txt cdb4553
bq. frameworks/mpi/nmpiexec 517bdbc
bq. frameworks/mpi/nmpiexec.py a5db9c0
bq. frameworks/mpi/startmpd.py 8eeba5e
bq. frameworks/mpi/startmpd.sh 44faa05
bq.
bq. Diff: https://reviews.apache.org/r/4768/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq.
bq. Thanks,
bq.
bq. Harvey
bq.
bq.
> Included MPI Framework Fails to Start
> -------------------------------------
>
> Key: MESOS-183
> URL: https://issues.apache.org/jira/browse/MESOS-183
> Project: Mesos
> Issue Type: Bug
> Components: documentation, framework
> Environment: Scientific Linux Cluster
> Reporter: Jessica J
> Assignee: Harvey Feng
> Priority: Blocker
> Labels: documentation, mpi, setup
>
> There are really two facets to this issue. The first is that no good
> documentation exists for setting up and using the included MPI framework. The
> second, and more important issue, is that the framework will not run. The
> second issue is possibly related to the first in that I may not be setting it
> up properly.
> To test the MPI framework, by trial and error I determined I needed to run
> python setup.py build and python setup.py install in the
> MESOS-HOME/src/python directory. Now when I try to run nmpiexec -h, I get an
> AttributeError, below:
> Traceback (most recent call last):
> File "./nmpiexec.py", line 2, in <module>
> import mesos
> File
> "/usr/lib64/python2.6/site-packages/mesos-0.9.0-py2.6-linux-x86_64.egg/mesos.py",
> line 22, in <module>
> import _mesos
> File
> "/usr/lib64/python2.6/site-packages/mesos-0.9.0-py2.6-linux-x86_64.egg/mesos_pb2.py",
> line 1286, in <module>
> DESCRIPTOR.message_types_by_name['FrameworkID'] = _FRAMEWORKID
> AttributeError: 'FileDescriptor' object has no attribute
> 'message_types_by_name'
> I've examined setup.py and determined that the version of protobuf it
> includes (2.4.1) does, indeed, contain a FileDescriptor class in
> descriptor.py that sets self.message_types_by_name, so I'm not sure what the
> issue is. Is this a bug? Or is there a step I'm missing? Do I need to also
> build/install protobuf?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira