Looks like there is no issue in 1.8.4 except for the message coalescing
bug. Ralph, Howard, and I agree that disabling message coalescing for
1.8.4 is the safest way forward. We can back-port the real fix for an
eventual 1.8.5. Message rates no longer seem to care about message
coalescing in the openib btl anymore. We beat mvapich handily without
the feature.

-Nathan

On Tue, Nov 04, 2014 at 10:27:56PM +0000, Jeff Squyres (jsquyres) wrote:
> That sounds fine, but I think Steve's point is that he is being bitten by 
> this bug now, so it would probably be good to even include this one 
> particular fix in 1.8.4.
> 
> 
> On Nov 4, 2014, at 5:24 PM, Nathan Hjelm <hje...@lanl.gov> wrote:
> 
> > Going to put the RFC out today with a timeout of about 2 weeks. This
> > will give me some time to talk with other Open MPI developers
> > face-to-face at SC14.
> > 
> > If the RFC fails I will still bring that and a couple of other fixes
> > into the master.
> > 
> > -Nathan
> > 
> > On Tue, Nov 04, 2014 at 04:06:45PM -0600, Steve Wise wrote:
> >>   Ok, sounds like I should let you continue the good work! :)  When do you
> >>   plan to merge this into ompi proper?
> >> 
> >>   On 11/4/2014 3:58 PM, Nathan Hjelm wrote:
> >> 
> >> That certainly addresses part of the problem. I am working on a complete
> >> revamp of the btl RDMA interface. It contains this fix:
> >> 
> >> https://github.com/hjelmn/ompi/commit/66fa429e306beb9fca59da0a4554e9b98d788316
> >> 
> >> -Nathan
> >> 
> >> On Tue, Nov 04, 2014 at 03:27:23PM -0600, Steve Wise wrote:
> >> 
> >> I found the bug.  Here is the fix:
> >> 
> >> [root@stevo1 openib]# git diff
> >> diff --git a/opal/mca/btl/openib/btl_openib_component.c
> >> b/opal/mca/btl/openib/btl_openib_component.c
> >> index d876e21..8a5ea82 100644
> >> --- a/opal/mca/btl/openib/btl_openib_component.c
> >> +++ b/opal/mca/btl/openib/btl_openib_component.c
> >> @@ -1960,9 +1960,8 @@ static int init_one_device(opal_list_t *btl_list,
> >> struct ibv_device* ib_dev)
> >>          }
> >> 
> >>          /* If the MCA param was specified, skip all the checks */
> >> -        if ( MCA_BASE_VAR_SOURCE_COMMAND_LINE ||
> >> -                MCA_BASE_VAR_SOURCE_ENV ==
> >> -            mca_btl_openib_component.receive_queues_source) {
> >> +        if (MCA_BASE_VAR_SOURCE_COMMAND_LINE ==
> >> mca_btl_openib_component.receive_queues_source||
> >> +            MCA_BASE_VAR_SOURCE_ENV ==
> >> mca_btl_openib_component.receive_queues_source) {
> >>              goto good;
> >>          }
> >> 
> >> 
> >> On 11/4/2014 3:08 PM, Nathan Hjelm wrote:
> >> 
> >> I have run into the issue as well. I will open a pull request for 1.8.4
> >> as part of a patch fixing the coalescing issues.
> >> 
> >> -Nathan
> >> 
> >> On Tue, Nov 04, 2014 at 02:50:30PM -0600, Steve Wise wrote:
> >> 
> >> On 11/4/2014 2:09 PM, Steve Wise wrote:
> >> 
> >> Hi,
> >> 
> >> I'm running ompi top-o-tree from github and seeing an openib btl issue
> >> where the qp/srq configuration is incorrect for the given device id.  This
> >> works fine in 1.8.4rc1, but I see the problem in top-of-tree.  A simple 2
> >> node IMB-MPI1 pingpong fails to get the ranks setup.  I see this logged:
> >> 
> >> /opt/ompi-trunk/bin/mpirun --allow-run-as-root --np 2 --host stevo1,stevo2
> >> --mca btl openib,sm,self /opt/ompi-trunk/bin/IMB-MPI1 pingpong
> >> 
> >> 
> >> Adding this works around the issue:
> >> 
> >> --mca btl_openib_receive_queues P,65536,64
> >> 
> >> I also confirmed that opal_btl_openib_ini_query() is getting the correct
> >> receive_queues string from the .ini file on both nodes for the cxgb4
> >> device...
> >> 
> >> 
> >> 
> >> <snip>
> >> 
> >> --------------------------------------------------------------------------
> >> 
> >> The Open MPI receive queue configuration for the OpenFabrics devices
> >> on two nodes are incompatible, meaning that MPI processes on two
> >> specific nodes were unable to communicate with each other.  This
> >> generally happens when you are using OpenFabrics devices from
> >> different vendors on the same network.  You should be able to use the
> >> mca_btl_openib_receive_queues MCA parameter to set a uniform receive
> >> queue configuration for all the devices in the MPI job, and therefore
> >> be able to run successfully.
> >> 
> >>  Local host:       stevo2
> >>  Local adapter:    cxgb4_0 (vendor 0x1425, part ID 21520)
> >>  Local queues: 
> >> P,128,256,192,128:S,2048,1024,1008,64:S,12288,1024,1008,64:S,65536,1024,1008,64
> >> 
> >>  Remote host:      stevo1
> >>  Remote adapter:   (vendor 0x1425, part ID 21520)
> >>  Remote queues:    P,65536,64
> >> ----------------------------------------------------------------------------
> >> 
> >> 
> >> The stevo1 rank has the correct queue settings: P,65536,64.  For some
> >> reason, stevo2 has the wrong settings, even though it has the correct
> >> device id info.
> >> 
> >> Any suggestions on debugging this?  Like where to dig in the src to see if
> >> somehow the .ini parsing is broken...
> >> 
> >> 
> >> Thanks,
> >> 
> >> Steve.
> >> _______________________________________________
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post:
> >> http://www.open-mpi.org/community/lists/devel/2014/11/16179.php
> >> 
> >> _______________________________________________
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post: 
> >> http://www.open-mpi.org/community/lists/devel/2014/11/16180.php
> >> 
> >> 
> >> _______________________________________________
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post: 
> >> http://www.open-mpi.org/community/lists/devel/2014/11/16181.php
> >> 
> >> _______________________________________________
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post: 
> >> http://www.open-mpi.org/community/lists/devel/2014/11/16182.php
> >> 
> >> _______________________________________________
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post: 
> >> http://www.open-mpi.org/community/lists/devel/2014/11/16184.php
> > 
> >> _______________________________________________
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post: 
> >> http://www.open-mpi.org/community/lists/devel/2014/11/16185.php
> > 
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/devel/2014/11/16187.php
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/11/16188.php

Attachment: pgpZJLsI29Uba.pgp
Description: PGP signature

Reply via email to