The last message was from my test output, it makes no sense anyway.

It looks like some QP/CQ initialization problem, but it’s hard to find the 
exact place at momemnt. I will try Gilles’ patch and see if it’s working for me.

PS: Actually I made the patch from 1.10 series when OOB was removed. Gilles’s 
patch was made from 1.6.x which worked for me too.


Thanks,
Shiqing

From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of 
r...@open-mpi.org
Sent: Thursday, April 20, 2017 6:32 PM
To: OpenMPI Devel
Subject: Re: [OMPI devel] openib oob module

I’m not seeing any problem inside the OOB - the problem appears to be in the 
info being given to it:

[host1:16244] 1 more process has sent help message help-mpi-btl-openib.txt / 
default subnet prefix
[host1:16244] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help 
/ error messages
[[46697,1],0][btl_openib_component.c:3501:handle_wc] from host1 to: 
192.168.2.22 error polling LP CQ with status RETRY EXCEEDED ERROR status number 
12 for wr_id 112db80 opcode 32767  vendor error 129 qp_idx 0

I’ve been searching, and I don’t see that help message anywhere in your output 
- not sure what happened to it. I do see this in your output - don’t know what 
it means:

[host1][[46697,1],0][connect/btl_openib_connect_oob.c:935:rml_recv_cb] 
!!!!!!!!!!!!!!!!!!!!!!!!!


On Apr 20, 2017, at 8:36 AM, Shiqing Fan 
<shiqing....@huawei.com<mailto:shiqing....@huawei.com>> wrote:

Forgot to enable oob verbose in my last test. Here is the updated output file.

Thanks,
Shiqing

From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of 
r...@open-mpi.org<mailto:r...@open-mpi.org>
Sent: Thursday, April 20, 2017 4:29 PM
To: OpenMPI Devel
Subject: Re: [OMPI devel] openib oob module

Yeah, I forgot that the 1.10 series still had the BTLs in OMPI. Should be able 
to restore it. I honestly don’t recall the bug, though :-(

If you want to try reviving it, you can add some debug in there (plus turn on 
the OOB verbosity) and I’m happy to help you figure it out.
Ralph

On Apr 20, 2017, at 7:13 AM, Shiqing Fan 
<shiqing....@huawei.com<mailto:shiqing....@huawei.com>> wrote:

Hi Ralph,

Yes, it’s been a long time. Hope you all are doing well (I believe so ☺ ).

I’m working on a virtualization project, and need to run Open MPI on an 
unikernel OS (most of OFED is missing/unsupported).

Actually I’m only focusing on 1.10.2, which still has oob in ompi. Probably it 
might be possible to make oob work there? Or even for 1.10 branch (as Gilles 
metioned)?
Do you have any clue about the bug in oob back then?

Regards,
Shiqing


From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of 
r...@open-mpi.org<mailto:r...@open-mpi.org>
Sent: Thursday, April 20, 2017 3:49 PM
To: OpenMPI Devel
Subject: Re: [OMPI devel] openib oob module

Hi Shiqing!

Been a long time - hope you are doing well.

I see no way to bring the oob module back now that the BTLs are in the OPAL 
layer - this is why it was removed as the oob is in ORTE, and thus not 
accessible from OPAL.
Ralph

On Apr 20, 2017, at 6:02 AM, Shiqing Fan 
<shiqing....@huawei.com<mailto:shiqing....@huawei.com>> wrote:

Dear all,

I noticed that openib oob module has been removed since a long time ago, 
because it wasn’t working anymore and nobody seemed need it.
But for some special operating system, where the rdmacm, udcm or ibcm kernel 
support is missing, oob may still be necessary.

I’m curious if it’s possible to bring this module back? How difficult would it 
be to fix the bug in order to make it work again in 1.10 branch or later? 
Thanks a lot.

Best Regards,
Shiqing
_______________________________________________
devel mailing list
devel@lists.open-mpi.org<mailto:devel@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

_______________________________________________
devel mailing list
devel@lists.open-mpi.org<mailto:devel@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

<output.txt>_______________________________________________
devel mailing list
devel@lists.open-mpi.org<mailto:devel@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to