[
https://issues.apache.org/jira/browse/MESOS-183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13270122#comment-13270122
]
[email protected] commented on MESOS-183:
-----------------------------------------------------
bq. On 2012-05-04 01:41:20, Benjamin Hindman wrote:
bq. > frameworks/mpi/README.txt, line 19
bq. > <https://reviews.apache.org/r/4768/diff/5/?file=105961#file105961line19>
bq. >
bq. > I know it's obvious, but you might want to remind users that you'll
need to install mpich2 on every machine in your cluster?.
Done.
bq. On 2012-05-04 01:41:20, Benjamin Hindman wrote:
bq. > frameworks/mpi/README.txt, line 23
bq. > <https://reviews.apache.org/r/4768/diff/5/?file=105961#file105961line23>
bq. >
bq. > Kill whitespace.
Done.
bq. On 2012-05-04 01:41:20, Benjamin Hindman wrote:
bq. > frameworks/mpi/README.txt, line 25
bq. > <https://reviews.apache.org/r/4768/diff/5/?file=105961#file105961line25>
bq. >
bq. > Kill whitespace.
Done.
bq. On 2012-05-04 01:41:20, Benjamin Hindman wrote:
bq. > frameworks/mpi/nmpiexec.py, line 26
bq. > <https://reviews.apache.org/r/4768/diff/5/?file=105963#file105963line26>
bq. >
bq. > s/mpd slots/mpd(s)
Done
bq. On 2012-05-04 01:41:20, Benjamin Hindman wrote:
bq. > frameworks/mpi/nmpiexec.py, line 71
bq. > <https://reviews.apache.org/r/4768/diff/5/?file=105963#file105963line71>
bq. >
bq. > If you move this check into the 'for offer in offers:' on line 60,
then you'll only be doing the check and decline in one place (not also on lines
107 and 108).
Done
bq. On 2012-05-04 01:41:20, Benjamin Hindman wrote:
bq. > frameworks/mpi/nmpiexec.py, line 118
bq. > <https://reviews.apache.org/r/4768/diff/5/?file=105963#file105963line118>
bq. >
bq. > Again, I'm not sure how ifhn_slave is going to be used. Can you
elaborate?
I left this in pending Jessica's response...it's removed now.
bq. On 2012-05-04 01:41:20, Benjamin Hindman wrote:
bq. > frameworks/mpi/nmpiexec.py, line 121
bq. > <https://reviews.apache.org/r/4768/diff/5/?file=105963#file105963line121>
bq. >
bq. > I love the long options! Thank you!
bq. On 2012-05-04 01:41:20, Benjamin Hindman wrote:
bq. > frameworks/mpi/nmpiexec.py, line 209
bq. > <https://reviews.apache.org/r/4768/diff/5/?file=105963#file105963line209>
bq. >
bq. > +1 to Jessica's comment.
This simplifies the trailing '/' check/fix to just os.path.join(options.path,
"").
bq. On 2012-05-04 01:41:20, Benjamin Hindman wrote:
bq. > frameworks/mpi/nmpiexec.py, line 221
bq. > <https://reviews.apache.org/r/4768/diff/5/?file=105963#file105963line221>
bq. >
bq. > +1 to Jessica's comment.
Unchanged after using the above.
bq. On 2012-05-04 01:41:20, Benjamin Hindman wrote:
bq. > frameworks/mpi/nmpiexec.py, line 230
bq. > <https://reviews.apache.org/r/4768/diff/5/?file=105963#file105963line230>
bq. >
bq. > mpdtraceerr is not used, kill it please.
Done.
- Harvey
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4768/#review7541
-----------------------------------------------------------
On 2012-05-02 13:29:50, Harvey Feng wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/4768/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-05-02 13:29:50)
bq.
bq.
bq. Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Some updates to point out:
bq.
bq. -nmpiexec.py
bq. -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved
'driver.stop()' to statusUpdate() so that it stops when all tasks have been
finished, which occurs when the executor's launched mpd processes have all
exited.
bq. -startmpd.py
bq. -> Didn't remove cleanup(), and added code in shutdown() that manually
kills mpd processes. They might be useful during abnormal (cleanup) and normal
(shutdown) framework/executor termination...I think. cleanup() still terminates
all mpd's in the slave, but shutdown doesn't.
bq. -> killtask() stops the mpd associated with the given tid.
bq. -> Task states update nicely now. They correspond to the state of a
task's associated mpd process.
bq. -Readme
bq. -> Included additional info on how to setup and run MPICH2 1.2 and
nmpiexec on OS X and Ubuntu/Linux
bq.
bq.
bq. This addresses bug MESOS-183.
bq. https://issues.apache.org/jira/browse/MESOS-183
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. frameworks/mpi/README.txt cdb4553
bq. frameworks/mpi/nmpiexec 517bdbc
bq. frameworks/mpi/nmpiexec.py a5db9c0
bq. frameworks/mpi/startmpd.py 8eeba5e
bq. frameworks/mpi/startmpd.sh 44faa05
bq.
bq. Diff: https://reviews.apache.org/r/4768/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq.
bq. Thanks,
bq.
bq. Harvey
bq.
bq.
> Included MPI Framework Fails to Start
> -------------------------------------
>
> Key: MESOS-183
> URL: https://issues.apache.org/jira/browse/MESOS-183
> Project: Mesos
> Issue Type: Bug
> Components: documentation, framework
> Environment: Scientific Linux Cluster
> Reporter: Jessica J
> Assignee: Harvey Feng
> Priority: Blocker
> Labels: documentation, mpi, setup
>
> There are really two facets to this issue. The first is that no good
> documentation exists for setting up and using the included MPI framework. The
> second, and more important issue, is that the framework will not run. The
> second issue is possibly related to the first in that I may not be setting it
> up properly.
> To test the MPI framework, by trial and error I determined I needed to run
> python setup.py build and python setup.py install in the
> MESOS-HOME/src/python directory. Now when I try to run nmpiexec -h, I get an
> AttributeError, below:
> Traceback (most recent call last):
> File "./nmpiexec.py", line 2, in <module>
> import mesos
> File
> "/usr/lib64/python2.6/site-packages/mesos-0.9.0-py2.6-linux-x86_64.egg/mesos.py",
> line 22, in <module>
> import _mesos
> File
> "/usr/lib64/python2.6/site-packages/mesos-0.9.0-py2.6-linux-x86_64.egg/mesos_pb2.py",
> line 1286, in <module>
> DESCRIPTOR.message_types_by_name['FrameworkID'] = _FRAMEWORKID
> AttributeError: 'FileDescriptor' object has no attribute
> 'message_types_by_name'
> I've examined setup.py and determined that the version of protobuf it
> includes (2.4.1) does, indeed, contain a FileDescriptor class in
> descriptor.py that sets self.message_types_by_name, so I'm not sure what the
> issue is. Is this a bug? Or is there a step I'm missing? Do I need to also
> build/install protobuf?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira