Since disabling oob:ud eliminated all of the messages, I am assuming that the warnings are from orted, which by its nature is expecting to fork()+exec() the application process. So, I would not be surprised if there is no setting to disable verbs fork support in oob:ud.
However, since the node executing mpirun is not connected to the IB network, it would seem logical to avoid all of these warnings (from a component that cannot run anyway). -Paul BTW: Using different nodes was all that was needed to get mtl:psm working. So, I assume the ones I had been using need a reboot. On Wed, Mar 4, 2015 at 11:51 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com > wrote: > I wonder if this is why we invented the "-1" default value for enabling > verbs fork support() -- because there are legitimate cases where > ibv_fork_init() fails, and the user doesn't care. Hence, -1 allows it to > fail and no one cares. > > Can you tell us why ibv_fork_init() would fail? > > > > > On Mar 4, 2015, at 9:56 AM, Paul Hargrove <phhargr...@lbl.gov> wrote: > > > > I have a system with InifniPath HCAs, where I've historically tested > mtl:psm. > > For some reason, that appears to have ceased working some time in the > past 4 months. > > However, this report is about something else. > > I am using the current master tarball: openmpi-dev-1203-g171d674.tar.bz2 > > > > When I ran configure, verbs support was found even though it was not my > intent to use it. > > So, I am running with an explicit blt list that omits verbs and am > disabling the broken mtl:psm and mtl:ofi as well. > > However, I am getting complaints from some verbs-related code: > > > > $ mpirun -mca btl sm,self,tcp -mca mtl ^psm,ofi -np 2 -host n15,n16 > examples/ring_c > > libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'. > > libibverbs: Warning: no userspace device-specific driver found for > /sys/class/infiniband_verbs/uverbs0 > > > -------------------------------------------------------------------------- > > Fork support was requested but the library call ibv_fork_init() failed. > > > > Hostname: n16 > > Error (22): Invalid argument > > > -------------------------------------------------------------------------- > > > -------------------------------------------------------------------------- > > Fork support was requested but the library call ibv_fork_init() failed. > > > > Hostname: n15 > > Error (22): Invalid argument > > > -------------------------------------------------------------------------- > > > -------------------------------------------------------------------------- > > Fork support was requested but the library call ibv_fork_init() failed. > > > > Hostname: n16 > > Error (22): Invalid argument > > > -------------------------------------------------------------------------- > > > -------------------------------------------------------------------------- > > Fork support was requested but the library call ibv_fork_init() failed. > > > > Hostname: n15 > > Error (22): Invalid argument > > > -------------------------------------------------------------------------- > > Process 0 sending 10 to 1, tag 201 (2 processes in ring) > > Process 0 sent to 1 > > Process 0 decremented value: 9 > > Process 0 decremented value: 8 > > Process 0 decremented value: 7 > > Process 0 decremented value: 6 > > Process 0 decremented value: 5 > > Process 0 decremented value: 4 > > Process 0 decremented value: 3 > > Process 0 decremented value: 2 > > Process 0 decremented value: 1 > > Process 0 decremented value: 0 > > Process 0 exiting > > Process 1 exiting > > > > > > There are at least THREE things "wrong" in my opinion. > > > > The first is the following two lines: > > libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'. > > libibverbs: Warning: no userspace device-specific driver found for > /sys/class/infiniband_verbs/uverbs0 > > However, I can run ibv_devinfo (and see ACTIVE ports) on both of the > compute nodes. > > So, these appear to me to be a complaint about the login node (which is > simply not on the IB network). > > I did not ask for ibv, and even if I did the message about a non-IB > login node is just an annoyance. > > > > The second is the "ibv_fork_init()" message twice per host, again when I > have NOT requested btl:verbs. > > > > The third is that I had to pass so many mca params just to get as far as > this! > > > > I did find that adding "-mca oob tcp" eliminated all the messages. > > So, I am assuming oob:ud is responsible for this mess. > > > > This does not appear to be a very good default behavior. > > + I believe oob:ud should *silently* disqualify itself when the node > running "mpirun" is not on the IB network. > > + I don't know why/when the ibv_fork_init() messages came about but they > are quite annoying when I don't even intend to *use* ibv. > > > > -Paul > > > > > > -- > > Paul H. Hargrove phhargr...@lbl.gov > > Computer Languages & Systems Software (CLaSS) Group > > Computer Science Department Tel: +1-510-495-2352 > > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/03/17093.php > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/03/17094.php > -- Paul H. Hargrove phhargr...@lbl.gov Computer Languages & Systems Software (CLaSS) Group Computer Science Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900