> On 2012-05-25 18:12:45, Jessica wrote:
> > frameworks/mpi/mpiexec-mesos.py, line 61
> > <https://reviews.apache.org/r/4768/diff/8/?file=109962#file109962line61>
> >
> >     I've been puzzling over why the return is an issue with this revision 
> > since it wasn't with earlier revisions, and I believe it's due to the fact 
> > that the return is within the for loop. Before, this return was outside of 
> > the loop, so we'd always complete the loop. Once the loop completed, we'd 
> > check if we had enough mpds, and if so, we'd launch. With this revision, we 
> > may never get a chance to complete the loop and thus never check if we have 
> > enough resources. I think a break would solve the problem, provided it's 
> > acceptable not to respond to all of the offers. Otherwise, we need to make 
> > sure to decline all offers.
> 
> Harvey Feng wrote:
>     You're right, I missed this :(. A continue would make sure we decline all 
> the offers if enough tasks are launched.
> 
> Jessica wrote:
>     Yes; however, after further investigation, I've discovered that 
> completing the function results in threading.Thread(target=mpiexec).start() 
> getting called multiple times. So I guess it either needs to go back to how 
> it was before (with the return before the loop) or there needs to be some 
> kind of flag that indicates whether the thread has already been launched. (I 
> used the flag approach, and it worked fine, but maybe you have a better idea.)

Fixed by adding a flag.


- Harvey


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4768/#review8116
-----------------------------------------------------------


On 2012-05-23 23:44:52, Harvey Feng wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4768/
> -----------------------------------------------------------
> 
> (Updated 2012-05-23 23:44:52)
> 
> 
> Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica.
> 
> 
> Summary
> -------
> 
> Some updates to point out:
> 
> -nmpiexec.py
>   -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved 
> 'driver.stop()' to statusUpdate() so that it stops when all tasks have been 
> finished, which occurs when the executor's launched mpd processes have all 
> exited. 
> -startmpd.py
>   -> Didn't remove cleanup(), and added code in shutdown() that manually 
> kills mpd processes. They might be useful during abnormal (cleanup) and 
> normal (shutdown) framework/executor termination...I think. cleanup() still 
> terminates all mpd's in the slave, but shutdown doesn't. 
>   -> killtask() stops the mpd associated with the given tid. 
>   -> Task states update nicely now. They correspond to the state of a task's 
> associated mpd process.
> -Readme
>   -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec 
> on OS X and Ubuntu/Linux
> 
> 
> This addresses bug MESOS-183.
>     https://issues.apache.org/jira/browse/MESOS-183
> 
> 
> Diffs
> -----
> 
>   frameworks/mpi/startmpd.py 8eeba5e 
>   frameworks/mpi/startmpd.sh 44faa05 
>   frameworks/mpi/nmpiexec 517bdbc 
>   frameworks/mpi/nmpiexec.py a5db9c0 
>   frameworks/mpi/mpiexec-mesos PRE-CREATION 
>   frameworks/mpi/mpiexec-mesos.py PRE-CREATION 
>   frameworks/mpi/README.txt cdb4553 
> 
> Diff: https://reviews.apache.org/r/4768/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Harvey
> 
>

Reply via email to