Thank you very much Chris. True scale it is and using 2.6.32-504.23.4 solved it.

Regards,
Magnus

-----Original Message-----
From: Chris Hunter [mailto:[email protected]] 
Sent: 18 November 2015 23:14
To: Lassus, Magnus <[email protected]>
Cc: [email protected]
Subject: Re: [lustre-discuss] o2ib (ib_qib) with 2.7.0 rpms on centos 6.6

Are you using truescale IB interfaces ?

There is a known truescale bug in rhel/centos 6.6 kernels. You should 
try kernel 2.6.32-504.23.4 or newer. Some details of the bug are in 
LU-6698 and RHSA-2015-1081.

regards,
chris hunter
yale hpc group

> From: "Lassus, Magnus" <[email protected]>
> To: "[email protected]"
>       <[email protected]>
> Subject: [lustre-discuss] o2ib (ib_qib) with 2.7.0 rpms on centos 6.6:
>       LNetError: kiblnd_init_rdma: Src buffer exhausted: 1 frags
> Message-ID:
>       
> <he1pr04mb1273c36e676e1824d8b2e3a494...@he1pr04mb1273.eurprd04.prod.outlook.com>
>       
> Content-Type: text/plain; charset="us-ascii"
>
> Hi,
>
> I fail to understand where I go wrong in getting o2ib working using 2.7.0 
> rpms on top of CentOS 6.6. Running selftest I see:
>
> Nov 17 18:22:40 ss08 kernel: LNet: Added LNI 10.165.32.18@o2ib [8/256/0/180]
> Nov 17 18:24:40 ss08 kernel: LNetError: 
> 12532:0:(o2iblnd_cb.c:1123:kiblnd_init_rdma()) Src buffer exhausted: 1 frags
> Nov 17 18:24:40 ss08 kernel: LustreError: 
> 12553:0:(brw_test.c:212:brw_check_page()) Bad data in page ffffea0070c20800: 
> 0xbeefbeefbeefbeef, 0xeeb0eeb1eeb2eeb3 expec
> Nov 17 18:24:40 ss08 kernel: LustreError: 
> 12553:0:(brw_test.c:238:brw_check_bulk()) Bulk page ffffea0070c20800 (0/256) 
> is corrupted!
> Nov 17 18:24:40 ss08 kernel: LustreError: 
> 12553:0:(brw_test.c:343:brw_client_done_rpc()) Bulk data from 
> 12345-10.165.32.18@o2ib is corrupted!
> Nov 17 18:24:40 ss08 kernel: LNetError: 
> 12532:0:(o2iblnd_cb.c:1690:kiblnd_reply()) Can't setup rdma for GET from 
> 10.165.32.18@o2ib: -71
> Nov 17 18:25:31 ss08 kernel: LNetError: 
> 12529:0:(o2iblnd_cb.c:3036:kiblnd_check_txs_locked()) Timed out tx: 
> active_txs, 0 seconds
> Nov 17 18:25:31 ss08 kernel: LNetError: 
> 12529:0:(o2iblnd_cb.c:3099:kiblnd_check_conns()) Timed out RDMA with 
> 10.165.32.18@o2ib (0): c: 7, oc: 0, rc: 7
> Nov 17 18:25:31 ss08 kernel: LustreError: 
> 12558:0:(brw_test.c:388:brw_bulk_ready()) BRW bulk WRITE failed for RPC from 
> 12345-10.165.32.18@o2ib: -103
> Nov 17 18:25:31 ss08 kernel: LustreError: 
> 12558:0:(brw_test.c:362:brw_server_rpc_done()) Bulk transfer from 
> 12345-10.165.32.18@o2ib has failed: -5
> Nov 17 18:25:48 ss08 kernel: LNet: 
> 12581:0:(rpc.c:1077:srpc_client_rpc_expired()) Client RPC expired: service 
> 11, peer 12345-10.165.32.18@o2ib, timeout 64.
> Nov 17 18:25:48 ss08 kernel: LustreError: 
> 12555:0:(brw_test.c:318:brw_client_done_rpc()) BRW RPC to 
> 12345-10.165.32.18@o2ib failed with -110
>
> # rpm -qa | egrep 'lustre|kernel' | sort
> dracut-kernel-004-356.el6.noarch
> kernel-2.6.32-504.8.1.el6_lustre.x86_64
> kernel-devel-2.6.32-504.8.1.el6_lustre.x86_64
> kernel-firmware-2.6.32-504.8.1.el6_lustre.x86_64
> kernel-headers-2.6.32-504.8.1.el6_lustre.x86_64
> lustre-2.7.0-2.6.32_504.8.1.el6_lustre.x86_64.x86_64
> lustre-iokit-2.7.0-2.6.32_504.8.1.el6_lustre.x86_64.x86_64
> lustre-modules-2.7.0-2.6.32_504.8.1.el6_lustre.x86_64.x86_64
> lustre-osd-ldiskfs-2.7.0-2.6.32_504.8.1.el6_lustre.x86_64.x86_64
> lustre-osd-ldiskfs-mount-2.7.0-2.6.32_504.8.1.el6_lustre.x86_64.x86_64
> lustre-tests-2.7.0-2.6.32_504.8.1.el6_lustre.x86_64.x86_64
> perf-2.6.32-504.8.1.el6_lustre.x86_64
> python-perf-2.6.32-504.8.1.el6_lustre.x86_64
>
> Using latest 2.7.63 build on 6.7 works.
>
> Any pointers are warmly welcome as I'd prefer to use 2.7.0.
>
> Regards,
> Magnus
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: 
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_pipermail_lustre-2Ddiscuss-2Dlustre.org_attachments_20151118_bc19b61a_attachment.html&d=AwICAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=d_G2h_sZYG4xtHMeKo8QgjDmOcMVdQvYgM-5Dri1AOY&m=yntd6s6FbhcK6yz7f--sTQB8uauio2sPpZXJO07_GMM&s=fmaW2S-MSdcgBPqEnTVELb9GaBrR0zwaQlFI9_QrbYw&e=
>  >
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> lustre-discuss mailing list
> [email protected]
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=AwICAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=d_G2h_sZYG4xtHMeKo8QgjDmOcMVdQvYgM-5Dri1AOY&m=yntd6s6FbhcK6yz7f--sTQB8uauio2sPpZXJO07_GMM&s=XPhf61e64WjkcxWw05wudsYWLfRBfsN0OiJF8O2DYE4&e=
>
>
> ------------------------------
>
> End of lustre-discuss Digest, Vol 116, Issue 9
> **********************************************
>
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to