Re: [OMPI devel] btl/vader: osu_bibw hangs when the number of execution loops is increased

Nathan Hjelm Tue, 05 Dec 2017 14:35:17 -0800

Should be fixed by PR #4569 (https://github.com/open-mpi/ompi/pull/4569). 
Please treat and let me know.


-Nathan

> On Dec 1, 2017, at 7:37 AM, DERBEY, NADIA <nadia.der...@atos.net> wrote:
> 
> Hi,
> 
> Our validation team detected a hang when running osu_bibw 
> micro-benchmarks from the OMB 5.3 suite on openmpi 2.0.2 (note that the 
> same hang appears with openmpi-3.0).
> This hang occurs when calling osu_bibw on a single node (vader btl) with 
> the options "-x 100 -i 1000".
> The -x option changes the warmup loop size.
> The -i option changes the measured loop size.
> 
> For each exchanged message size, osu_bibw loops doing the following 
> sequence on both ranks:
>    . posts 64 non-blocking sends
>    . posts 64 non-blocking receives
>    . waits for all the send requests to complete
>    . waits for all the receive requests to complete
> 
> The loop size is the sum of
>    . options.skip (warm up phase that can be changed with the -x option)
>    . options.loop (actually measured loop that can be changed with the 
> -i option).
> 
> The default values are the following:
> 
> +==============+======+======+
> | message size | skip | loop |
> |==============+======+======|
> |    <= 8K     |   10 |  100 |
> |    >  8K     |    2 |   20 |
> +==============+======+======+
> 
> As said above, the test hangs when moving to more aggressive loop 
> values: 100 for skip and 1000 for loop.
> 
> mca_btl_vader_frag_alloc() calls opal_free_list_get() to get a fragment 
> from the appropriate free list.
> If there are no free fragments anymore, opal_free_list_get() calls 
> opal_free_list_grow() which in turn calls mca_btl_vader_frag_init() 
> (initialization routine for the vader btl fragements).
> This routine checks if there is enough space left in the mapped memory 
> segment for the wanted fragment size (current offset + fragment size 
> shoudl be <= segment size), and it makes opal_free_list_grow fail if the 
> shared memory segment is exhausted.
> 
> As soon as we begin exhausting memory, the 2 ranks get unsynchronized 
> and the test rapidly hangs. To avoid this hang, I found 2 possible 
> solutions:
> 
> 1) change the vader btl segment size: I have set it to 4GB - in order to 
> be able to do this, I had to change the type parameter in the parameter 
> registrations to MCA_BASE_VAR_TYPE_SIZE_T.
> 
> 2) change the call to opal_free_list_get() by a call to 
> opal_free_list_wait() in mca_btl_vader_frag_alloc(). This also makes the 
> micro-benchmark run to the end.
> 
> So my question is: what would be the best approach (#1 or #2)? and the 
> question behind this is: what is the reason that makes favoring 
> opal_free_list_get() instead of opal_free_list_wait().
> 
> Thanks
> 
> -- 
> Nadia Derbey - B1-387
> HPC R&D - MPI
> Tel: +33 4 76 29 77 62
> nadia.der...@atos.net
> 1 Rue de Provence BP 208
> 38130 Echirolles Cedex, France
> www.atos.com
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] btl/vader: osu_bibw hangs when the number of execution loops is increased

Reply via email to