Thanks Josh.

Steve -- if you can confirm that this fixes your problem in the v1.6 series, 
we'll go ahead and commit the patch.

FWIW: the OpenFabrics startup code got a little cleanup/revamp on the 
trunk/v1.7 -- I suspect that's why you're not seeing the problem on trunk/v1.7 
(e.g., look at the utility routines that were abstracted out to 
ompi/mca/common/verbs).



On Jan 29, 2013, at 2:41 AM, Joshua Ladd <josh...@mellanox.com> wrote:

> So, we (Mellanox) have observed this ourselves when no suitable CPC can be 
> found. Seems the BTL associated with this port is not destroyed and the ref 
> count is not decreased.  Not sure why you don't see the problem in 1.7. But 
> we have a patch that I'll CMR today. Please review our symptoms, diagnosis, 
> and proposed change. Ralph, maybe I can list you as a reviewer of the patch? 
> I've reviewed myself and it looks fine, but wouldn't mind having another set 
> of eyes on it since I don't want to be responsible for breaking the OpenIB 
> BTL.
> 
> Thanks,
> 
> Josh Ladd
> 
> 
> Reported by Yossi:
> Hi,
> 
> There is a bug in open mpi (openib component) when one of the active ports is 
> Ethernet.
> The fix is attached, probably needs to be reviewed and submitted to ompi
> 
> Error flow:
> 1.    Openib component creates a btl instance for every active port 
> (including Ethernet)
> 2.    Every btl holds a reference count to the device 
> (mca_btl_openib_device_t::btls)
> 3.    Openib tries to create a "connection module" for every btl
> 4.    It fails to create connection module for the Ethernet port
> 5.    The btl for Ethernet port is not returned by openib component, in the 
> list of btl modules
> 6.    The btl for Ethernet port is not destroyed during openib component 
> finalize
> 7.    The device is not destroyed, because of the reference count
> 8.    The memory pool created by the device is not destroyed
> 9.    Later, rdma mpool module cleans up remaining pools during its finalize
> 10.   The memory pool created by openib is destroyed by rdma mpool component 
> finalize
> 11.   The memory pool points to a function (openib_dereg_mr) which is already 
> unloaded from memory (because mca_btl_openib.so was unloaded)
> 12.   Segfault because of a call to invalid function
> 
> The fix:  If a btl module is not going to be returned from openib component 
> init, destroy it.
> 
> 
> 
> 
> 
> 
> -----Original Message-----
> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On 
> Behalf Of Ralph Castain
> Sent: Monday, January 28, 2013 8:35 PM
> To: Steve Wise
> Cc: Open MPI Developers
> Subject: Re: [OMPI devel] openib unloaded before last mem dereg
> 
> Out of curiosity, could you tell us how you configured OMPI?
> 
> 
> On Jan 28, 2013, at 12:46 PM, Steve Wise <sw...@opengridcomputing.com> wrote:
> 
>> On 1/28/2013 2:04 PM, Ralph Castain wrote:
>>> On Jan 28, 2013, at 11:55 AM, Steve Wise <sw...@opengridcomputing.com> 
>>> wrote:
>>> 
>>>> Do you know if the rdmacm CPC is really being used for your connection 
>>>> setup (vs other CPCs supported by IB)?  Cuz iwarp only supports rdmacm.  
>>>> Maybe that's the difference?
>>> Dunno for certain, but I expect it is using the OOB cm since I didn't 
>>> direct it to do anything different. Like I said, I suspect the problem is 
>>> that the cluster doesn't have iWARP on it.
>> 
>> Definitely, or it could be the different CPC used for IWvs IB is tickling 
>> the issue.
>> 
>>>> Steve.
>>>> 
>>>> On 1/28/2013 1:47 PM, Ralph Castain wrote:
>>>>> Nope - still works just fine. I didn't receive that warning at all, and 
>>>>> it ran to completion without problem.
>>>>> 
>>>>> I suspect the problem is that the system I can use just isn't 
>>>>> configured like yours, and so I can't trigger the problem. Afraid I 
>>>>> can't be of help after all... :-(
>>>>> 
>>>>> 
>>>>> On Jan 28, 2013, at 11:25 AM, Steve Wise <sw...@opengridcomputing.com> 
>>>>> wrote:
>>>>> 
>>>>>> On 1/28/2013 12:48 PM, Ralph Castain wrote:
>>>>>>> Hmmm...afraid I cannot replicate this using the current state of the 
>>>>>>> 1.6 branch (which is the 1.6.4rcN) on the only IB-based cluster I can 
>>>>>>> access.
>>>>>>> 
>>>>>>> Can you try it with a 1.6.4 tarball and see if you still see the 
>>>>>>> problem? Could be someone already fixed it.
>>>>>> I still hit it on 1.6.4rc2.
>>>>>> 
>>>>>> Note iWARP != IB so you may not have this issue on IB systems for 
>>>>>> various reasons.  Did you use the same mpirun line? Namely using this:
>>>>>> 
>>>>>> --mca btl_openib_ipaddr_include "192.168.170.0/24"
>>>>>> 
>>>>>> (adjusted to your network config).
>>>>>> 
>>>>>> Because if I don't use ipaddr_include, then I don't see this issue on my 
>>>>>> setup.
>>>>>> 
>>>>>> Also, did you see these logged:
>>>>>> 
>>>>>> Right after starting the job:
>>>>>> 
>>>>>> ------------------------------------------------------------------
>>>>>> -------- No OpenFabrics connection schemes reported that they were 
>>>>>> able to be used on a specific port.  As such, the openib BTL 
>>>>>> (OpenFabrics
>>>>>> support) will be disabled for this port.
>>>>>> 
>>>>>> Local host:           hpc-hn1.ogc.int
>>>>>> Local device:         cxgb4_0
>>>>>> Local port:           2
>>>>>> CPCs attempted:       oob, xoob, rdmacm
>>>>>> ------------------------------------------------------------------
>>>>>> --------
>>>>>> ...
>>>>>> 
>>>>>> At the end of the job:
>>>>>> 
>>>>>> [hpc-hn1.ogc.int:07850] 5 more processes have sent help message 
>>>>>> help-mpi-btl-openib-cpc-base.txt / no cpcs for port
>>>>>> 
>>>>>> 
>>>>>> I think these are benign, but prolly indicate a bug: the mpirun is 
>>>>>> restricting the job to use port 1 only, so the CPCs shouldn't be 
>>>>>> attempting port 2...
>>>>>> 
>>>>>> Steve.
>>>>>> 
>>>>>> 
>>>>>>> On Jan 28, 2013, at 10:03 AM, Steve Wise <sw...@opengridcomputing.com> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> On 1/28/2013 11:48 AM, Ralph Castain wrote:
>>>>>>>>> On Jan 28, 2013, at 9:12 AM, Steve Wise <sw...@opengridcomputing.com> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> On 1/25/2013 12:19 PM, Steve Wise wrote:
>>>>>>>>>>> Hello,
>>>>>>>>>>> 
>>>>>>>>>>> I'm tracking an issue I see in openmpi-1.6.3.  Running this command 
>>>>>>>>>>> on my chelsio iwarp/rdma setup causes a seg fault every time:
>>>>>>>>>>> 
>>>>>>>>>>> /usr/mpi/gcc/openmpi-1.6.3-dbg/bin/mpirun --np 2 --host 
>>>>>>>>>>> hpc-hn1,hpc-cn2 --mca btl openib,sm,self --mca 
>>>>>>>>>>> btl_openib_ipaddr_include "192.168.170.0/24" 
>>>>>>>>>>> /usr/mpi/gcc/openmpi-1.6.3/tests/IMB-3.2/IMB-MPI1 pingpong
>>>>>>>>>>> 
>>>>>>>>>>> The segfault is during finalization, and I've debugged this to the 
>>>>>>>>>>> point were I see a call to dereg_mem() after the openib blt is 
>>>>>>>>>>> unloaded via dlclose().  dereg_mem() dereferences a function 
>>>>>>>>>>> pointer to call the btl-specific dereg function, in this case it is 
>>>>>>>>>>> openib_dereg_mr().  However, since that btl has already been 
>>>>>>>>>>> unloaded, the deref causes a seg fault.  Happens every time with 
>>>>>>>>>>> the above mpi job.
>>>>>>>>>>> 
>>>>>>>>>>> Now, I tried this same experiment with openmpi-1.7rc6 and I don't 
>>>>>>>>>>> see the seg fault, and I don't see a call to dereg_mem() after the 
>>>>>>>>>>> openib btl is unloaded.  That's all well good. :)  But I'd like to 
>>>>>>>>>>> get this fix pushed into 1.6 since that is the current stable 
>>>>>>>>>>> release.
>>>>>>>>>>> 
>>>>>>>>>>> Question:  Can someone point me to the fix in 1.7?
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> 
>>>>>>>>>>> Steve.
>>>>>>>>>> It appears that in ompi_mpi_finalize(), mca_pml_base_close() is 
>>>>>>>>>> called which unloads the openib btl.  Then further down in 
>>>>>>>>>> ompi_mpi_finalize(), mca_mpool_base_close() is called which ends up 
>>>>>>>>>> calling dereg_mem() which seg faults trying to call into the 
>>>>>>>>>> unloaded openib btl.
>>>>>>>>>> 
>>>>>>>>> That definitely sounds like a bug
>>>>>>>>> 
>>>>>>>>>> Anybody have thoughts?  Anybody care? :)
>>>>>>>>> I care! It needs to be fixed - I'll take a look. Probably something 
>>>>>>>>> that forgot to be cmr'd.
>>>>>>>> Great!  If you want me to try out a fix or gather more debug, just 
>>>>>>>> hollar.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> 
>>>>>>>> Steve.
>>>>>>>> 
>> 
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to