Since disabling oob:ud eliminated all of the messages, I am assuming that
the warnings are from orted, which by its nature is expecting to
fork()+exec() the application process.  So, I would not be surprised if
there is no setting to disable verbs fork support in oob:ud.

However, since the node executing mpirun is not connected to the IB
network, it would seem logical to avoid all of these warnings (from a
component that cannot run anyway).

-Paul

BTW:
Using different nodes was all that was needed to get mtl:psm working.
So, I assume the ones I had been using need a reboot.

On Wed, Mar 4, 2015 at 11:51 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com
> wrote:

> I wonder if this is why we invented the "-1" default value for enabling
> verbs fork support() -- because there are legitimate cases where
> ibv_fork_init() fails, and the user doesn't care.  Hence, -1 allows it to
> fail and no one cares.
>
> Can you tell us why ibv_fork_init() would fail?
>
>
>
> > On Mar 4, 2015, at 9:56 AM, Paul Hargrove <phhargr...@lbl.gov> wrote:
> >
> > I have a system with InifniPath HCAs, where I've historically tested
> mtl:psm.
> > For some reason, that appears to have ceased working some time in the
> past 4 months.
> > However, this report is about something else.
> > I am using the current master tarball: openmpi-dev-1203-g171d674.tar.bz2
> >
> > When I ran configure, verbs support was found even though it was not my
> intent to use it.
> > So, I am running with an explicit blt list that omits verbs and am
> disabling the broken mtl:psm and mtl:ofi as well.
> > However, I am getting complaints from some verbs-related code:
> >
> > $ mpirun -mca btl sm,self,tcp -mca mtl ^psm,ofi -np 2 -host n15,n16
> examples/ring_c
> > libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
> > libibverbs: Warning: no userspace device-specific driver found for
> /sys/class/infiniband_verbs/uverbs0
> >
> --------------------------------------------------------------------------
> > Fork support was requested but the library call ibv_fork_init() failed.
> >
> >   Hostname:    n16
> >   Error (22):  Invalid argument
> >
> --------------------------------------------------------------------------
> >
> --------------------------------------------------------------------------
> > Fork support was requested but the library call ibv_fork_init() failed.
> >
> >   Hostname:    n15
> >   Error (22):  Invalid argument
> >
> --------------------------------------------------------------------------
> >
> --------------------------------------------------------------------------
> > Fork support was requested but the library call ibv_fork_init() failed.
> >
> >   Hostname:    n16
> >   Error (22):  Invalid argument
> >
> --------------------------------------------------------------------------
> >
> --------------------------------------------------------------------------
> > Fork support was requested but the library call ibv_fork_init() failed.
> >
> >   Hostname:    n15
> >   Error (22):  Invalid argument
> >
> --------------------------------------------------------------------------
> > Process 0 sending 10 to 1, tag 201 (2 processes in ring)
> > Process 0 sent to 1
> > Process 0 decremented value: 9
> > Process 0 decremented value: 8
> > Process 0 decremented value: 7
> > Process 0 decremented value: 6
> > Process 0 decremented value: 5
> > Process 0 decremented value: 4
> > Process 0 decremented value: 3
> > Process 0 decremented value: 2
> > Process 0 decremented value: 1
> > Process 0 decremented value: 0
> > Process 0 exiting
> > Process 1 exiting
> >
> >
> > There are at least THREE things "wrong" in my opinion.
> >
> > The first is the following two lines:
> > libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
> > libibverbs: Warning: no userspace device-specific driver found for
> /sys/class/infiniband_verbs/uverbs0
> > However, I can run ibv_devinfo (and see ACTIVE ports) on both of the
> compute nodes.
> > So, these appear to me to be a complaint about the login node (which is
> simply not on the IB network).
> > I did not ask for ibv, and even if I did the message about a non-IB
> login node is just an annoyance.
> >
> > The second is the "ibv_fork_init()" message twice per host, again when I
> have NOT requested btl:verbs.
> >
> > The third is that I had to pass so many mca params just to get as far as
> this!
> >
> > I did find that adding "-mca oob tcp" eliminated all the messages.
> > So, I am assuming oob:ud is responsible for this mess.
> >
> > This does not appear to be a very good default behavior.
> > + I believe oob:ud should *silently* disqualify itself when the node
> running "mpirun" is not on the IB network.
> > + I don't know why/when the ibv_fork_init() messages came about but they
> are quite annoying when I don't even intend to *use* ibv.
> >
> > -Paul
> >
> >
> > --
> > Paul H. Hargrove                          phhargr...@lbl.gov
> > Computer Languages & Systems Software (CLaSS) Group
> > Computer Science Department               Tel: +1-510-495-2352
> > Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/03/17093.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/03/17094.php
>



-- 
Paul H. Hargrove                          phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department               Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

Reply via email to