Artem, thanks for the feedback.
i commited the patch to the trunk (r31922) as i indicated in the commit log, this patch is likely suboptimal and has room for improvement. Jeff commented about the usnic related issue, so i will wait for a fix from the Cisco folks. Cheers, Gilles On Sun, Jun 1, 2014 at 10:12 PM, Artem Polyakov <artpo...@gmail.com> wrote: > > I test your approach. Both: > a) export OMPI_MCA_btl_openib_use_eager_rdma=0 > b) applying your patch and run without "export > OMPI_MCA_btl_openib_use_eager_rdma=0" > works well for me. > This fixes first part of the problem: when OMPI_MCA_btl="openib,self" > > However once I comment out this statement thus giving OMPI the right to > deside which BTL to use program hangs again. Here is additional information > that can be useful: > > 1. If I set 1 slot per node this problem doesn't rise. > > 2. If I use at least 2 cores per node I can see this hang. > Here is the backtraces for all branches of hanged program: > >