Hmm.  I'm of split mind here.

I can see what Howard is saying here -- adding complexity is usually a bad 
thing.

But we have gotten these problem reports multiple times over the years: someone 
*thinking* that they have built with launcher support X (e.g., TM, LSF), but 
then figuring out later that things aren't running as expected, and after a 
bunch of work, figure out that it's because they didn't build with support X.

Gilles idea actually sounds interesting -- if the tm module detect some of the 
sentinel PBS/TM env variables, emit a show_help() if we don't have full TM 
support compiled in.  This would actually save some users a bunch of time and 
frustration.

--> Keep in mind that the SLRUM launcher is different, because it's all 
CLI-based (not API-based) and therefore we always build it (because we don't 
have to find headers and libraries).

FWIW, we do have precedent of having extra MCA params for users to turn off 
warnings that they don't want to see.

I guess the question here is: is there a valid use case for running in 
PBS/Torque and *not* wanting to use the TM launcher?





> On Jan 25, 2016, at 10:11 AM, Howard Pritchard <hpprit...@gmail.com> wrote:
> 
> Hi Gilles
> 
> I would prefer improving the faq rather than adding yet more complexity in 
> this area.  The way things go you would add this feature then someone else 
> with a different use case would complain we had broken something for them.  
> Then we would add another mca param to disable the new tm less module etc.
> 
> I think the faq should be more explicit about configury options required for 
> orte/resource manager integration feature to work. 
> 
> Howard
> ----------
> 
> sent from my smart phonr so no good type.
> 
> Howard
> 
> On Jan 24, 2016 5:17 PM, "Gilles Gouaillardet" <gil...@rist.or.jp> wrote:
> Folks,
> 
> there was a question about mtt on the mtt mailing list 
> http://www.open-mpi.org/community/lists/mtt-users/2016/01/0840.php
> 
> after a few emails (some offline) it seems that was a configuration issue.
> the user is running PBSPro and it seems OpenMPI was not configured with the 
> tm module
> (e.g. tm is not included in the default location, and he did not configure 
> with --with-tm=/.../pbspro)
> 
> in this case, the tm module is not built, and when a job runs under PBSPro 
> without any hostfile,
> the job runs on one node only.
> in order to make this easier to diagnose, what about always building the tm 
> module :
> - if tm is found by configury, build the OpenMPI tm modules as usual
> - if tm is not found by configury, build a dumb module that will issue a 
> warning or abort
>   if a job is ran under PBS/torque
>   (e.g. some PBS specific environment variable are defined)
> 
> of course, the spec of this "dumb" module can be improved, for example
> - add a MCA parameter to disable the warning
> - issue the warning only if there is more that one node in the job *and* no 
> machinefile nor host list was passed to the mpirun command line
> 
> Any thoughts ?
> 
> Cheers,
> 
> Gilles
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/01/18497.php
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/01/18505.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to