Thanks Gilles, I will try it out today and let you know if it's working for me 
or not.

Regards,
Shiqing

-----Original Message-----
From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of Gilles 
Gouaillardet
Sent: Friday, April 21, 2017 9:41 AM
To: devel@lists.open-mpi.org
Subject: Re: [OMPI devel] openib oob module

Folks,


fwiw, i made https://github.com/open-mpi/ompi/pull/3393 and it works for me on 
a mlx4 cluster (Mellanox QDR)


Cheers,


Gilles


On 4/21/2017 1:31 AM, r...@open-mpi.org wrote:
> I’m not seeing any problem inside the OOB - the problem appears to be 
> in the info being given to it:
>
> [host1:16244] 1 more process has sent help message 
> help-mpi-btl-openib.txt / default subnet prefix
> [host1:16244] Set MCA parameter "orte_base_help_aggregate" to 0 to see 
> all help / error messages
> [[46697,1],0][btl_openib_component.c:3501:handle_wc] from host1 to: 
> 192.168.2.22 error polling LP CQ with status RETRY EXCEEDED ERROR 
> status number 12 for wr_id 112db80 opcode 32767  vendor error 129 qp_idx 0
>
> I’ve been searching, and I don’t see that help message anywhere in 
> your output - not sure what happened to it. I do see this in your 
> output - don’t know what it means:
>
> [host1][[46697,1],0][connect/btl_openib_connect_oob.c:935:rml_recv_cb] 
> !!!!!!!!!!!!!!!!!!!!!!!!!
>
>
>> On Apr 20, 2017, at 8:36 AM, Shiqing Fan <shiqing....@huawei.com 
>> <mailto:shiqing....@huawei.com>> wrote:
>>
>> Forgot to enable oob verbose in my last test. Here is the updated 
>> output file.
>> Thanks,
>> Shiqing
>> *From:*devel [mailto:devel-boun...@lists.open-mpi.org]*On Behalf 
>> Of*r...@open-mpi.org <mailto:r...@open-mpi.org>
>> *Sent:*Thursday, April 20, 2017 4:29 PM
>> *To:*OpenMPI Devel
>> *Subject:*Re: [OMPI devel] openib oob module
>> Yeah, I forgot that the 1.10 series still had the BTLs in OMPI. 
>> Should be able to restore it. I honestly don’t recall the bug, though :-(
>> If you want to try reviving it, you can add some debug in there (plus 
>> turn on the OOB verbosity) and I’m happy to help you figure it out.
>> Ralph
>>
>>     On Apr 20, 2017, at 7:13 AM, Shiqing Fan <shiqing....@huawei.com
>>     <mailto:shiqing....@huawei.com>> wrote:
>>     Hi Ralph,
>>     Yes, it’s been a long time. Hope you all are doing well (I
>>     believe soJ).
>>     I’m working on a virtualization project, and need to run Open MPI
>>     on an unikernel OS (most of OFED is missing/unsupported).
>>     Actually I’m only focusing on 1.10.2, which still has oob in
>>     ompi. Probably it might be possible to make oob work there? Or
>>     even for 1.10 branch (as Gilles metioned)?
>>     Do you have any clue about the bug in oob back then?
>>     Regards,
>>     Shiqing
>>     *From:*devel [mailto:devel-boun...@lists.open-mpi.org]*On Behalf
>>     Of*r...@open-mpi.org <mailto:r...@open-mpi.org>
>>     *Sent:*Thursday, April 20, 2017 3:49 PM
>>     *To:*OpenMPI Devel
>>     *Subject:*Re: [OMPI devel] openib oob module
>>     Hi Shiqing!
>>     Been a long time - hope you are doing well.
>>     I see no way to bring the oob module back now that the BTLs are
>>     in the OPAL layer - this is why it was removed as the oob is in
>>     ORTE, and thus not accessible from OPAL.
>>     Ralph
>>
>>         On Apr 20, 2017, at 6:02 AM, Shiqing Fan
>>         <shiqing....@huawei.com <mailto:shiqing....@huawei.com>> wrote:
>>         Dear all,
>>         I noticed that openib oob module has been removed since a
>>         long time ago, because it wasn’t working anymore and nobody
>>         seemed need it.
>>         But for some special operating system, where the rdmacm, udcm
>>         or ibcm kernel support is missing, oob may still be necessary.
>>         I’m curious if it’s possible to bring this module back? How
>>         difficult would it be to fix the bug in order to make it work
>>         again in 1.10 branch or later? Thanks a lot.
>>         Best Regards,
>>         Shiqing
>>         _______________________________________________
>>         devel mailing list
>>         devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
>>         https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>
>>     _______________________________________________
>>     devel mailing list
>>     devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
>>     https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>
>> <output.txt>_______________________________________________
>> devel mailing list
>> devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
>
>
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to