Hi,
Did anyone try Mellanox OFED 4.4-1.0.0.0?
With Lustre 2.10.4 and CentOS 6.10 and 6.9 we have issues. Using CentOS
6.9 and the previous supported version there are no problems (CentOS
6.10 is not supported on the previous).
We are using ConnectX-3 cards on kernel 2.6.32-696.18.7.el6.x86_64.
First mount after start of openibd fails. Attached 'first.txt' shows the
log.
A second mount succeeds ('second.txt'). The OSTs are slowly added after
some timeouts. Everything seems to work after this.
After this we can unmount and mount again and everything is normal.
However, reloading the driver (restart openibd) the mount fails again.
I'll have a go at CentOS 7.5 and contact Mellanox next.
Cheers,
Hans Henrik
Aug 3 13:26:49 node578 kernel: LNet: HW NUMA nodes: 2, HW CPU cores: 64,
npartitions: 2
Aug 3 13:26:49 node578 kernel: alg: No test for adler32 (adler32-zlib)
Aug 3 13:26:49 node578 kernel: alg: No test for crc32 (crc32-table)
Aug 3 13:26:49 node578 kernel: alg: No test for crc32 (crc32-pclmul)
Aug 3 13:26:50 node578 kernel: Lustre: Lustre: Build Version: 2.10.4
Aug 3 13:26:50 node578 kernel: LNet: Added LNI 10.21.205.78@o2ib [8/256/0/180]
Aug 3 13:26:53 node578 kernel: LNet:
73616:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Timed out tx for
10.21.10.111@o2ib: 4478161 seconds
Aug 3 13:26:53 node578 kernel: Lustre:
73626:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has failed
due to network error: [sent 1533295610/real 1533295613] req@ffff885f76c0bc80
x1607776977551376/t0(0) o250->MGC10.21.10.111@[email protected]@o2ib:26/25 lens
520/544 e 0 to 1 dl 1533295615 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1
Aug 3 13:26:56 node578 kernel: LustreError:
73562:0:(mgc_request.c:251:do_config_log_add()) MGC10.21.10.111@o2ib: failed
processing log, type 1: rc = -5
Aug 3 13:27:05 node578 kernel: LustreError:
73703:0:(mgc_request.c:603:do_requeue()) failed processing log: -5
Aug 3 13:27:18 node578 kernel: LNet:
73616:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Timed out tx for
10.21.10.111@o2ib: 4478186 seconds
Aug 3 13:27:18 node578 kernel: Lustre:
73626:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has failed
due to network error: [sent 1533295635/real 1533295638] req@ffff88bfa9bc3cc0
x1607776977551440/t0(0) o250->MGC10.21.10.111@[email protected]@o2ib:26/25 lens
520/544 e 0 to 1 dl 1533295645 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1
Aug 3 13:27:27 node578 kernel: LustreError: 15c-8: MGC10.21.10.111@o2ib: The
configuration from log 'hpc-client' failed (-5). This may be the result of
communication errors between this node and the MGS, a bad configuration, or
other errors. See the syslog for more information.
Aug 3 13:27:27 node578 kernel: Lustre: Unmounted hpc-client
Aug 3 13:27:27 node578 kernel: LustreError:
73562:0:(obd_mount.c:1582:lustre_fill_super()) Unable to mount (-5)
Aug 3 13:41:33 node578 kernel: Lustre: hpc: root_squash is set to 99:99
Aug 3 13:41:33 node578 kernel: Lustre: hpc: nosquash_nids set to
172.20.1.10@tcp1 172.20.1.221@tcp1 172.20.1.71@tcp1 10.121.16.11@tcp1
Aug 3 13:41:39 node578 kernel: Lustre:
73626:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed
out for sent delay: [sent 1533296493/real 0] req@ffff885f98ec79c0
x1607776977551760/t0(0)
o38->[email protected]@o2ib:12/10 lens 520/544 e 0
to 1 dl 1533296498 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 3 13:42:51 node578 kernel: LNet:
73616:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Timed out tx for
10.21.10.102@o2ib: 4479119 seconds
Aug 3 13:42:51 node578 kernel: Lustre:
73626:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has failed
due to network error: [sent 1533296568/real 1533296571] req@ffff88bf97e36cc0
x1607776977551840/t0(0)
o38->[email protected]@o2ib:12/10 lens 520/544 e 0
to 1 dl 1533296573 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1
Aug 3 13:43:14 node578 kernel: Lustre: Mounted hpc-client
Aug 3 13:43:16 node578 kernel: LustreError:
73774:0:(llite_lib.c:1772:ll_statfs_internal()) obd_statfs fails: rc = -5
Aug 3 13:43:17 node578 kernel: LNet:
73616:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Timed out tx for
10.21.10.112@o2ib: 4479145 seconds
Aug 3 13:43:17 node578 kernel: Lustre:
73626:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has failed
due to network error: [sent 1533296594/real 1533296597] req@ffff885f98ec7cc0
x1607776977551904/t0(0)
o8->[email protected]@o2ib:28/4 lens 520/544 e 0 to
1 dl 1533296599 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1
Aug 3 13:43:18 node578 kernel: LustreError:
73775:0:(llite_lib.c:1772:ll_statfs_internal()) obd_statfs fails: rc = -5
Aug 3 13:43:19 node578 kernel: LNet:
73616:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Timed out tx for
10.21.10.121@o2ib: 5 seconds
Aug 3 13:43:19 node578 kernel: LNet:
73616:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Skipped 3 previous similar
messages
Aug 3 13:43:33 node578 kernel: LustreError:
73782:0:(llite_lib.c:1772:ll_statfs_internal()) obd_statfs fails: rc = -5
Aug 3 13:43:38 node578 kernel: LustreError:
73784:0:(llite_lib.c:1772:ll_statfs_internal()) obd_statfs fails: rc = -5
Aug 3 13:43:38 node578 kernel: LustreError:
73784:0:(llite_lib.c:1772:ll_statfs_internal()) Skipped 1 previous similar
message
Aug 3 13:43:44 node578 kernel: Lustre:
73626:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed
out for sent delay: [sent 1533296619/real 0] req@ffff88bfad20cc80
x1607776977552208/t0(0)
o8->[email protected]@o2ib:28/4 lens 520/544 e 0 to
1 dl 1533296624 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 3 13:43:44 node578 kernel: Lustre:
73626:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 6 previous similar
messages
Aug 3 13:43:44 node578 kernel: LNet:
73616:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Timed out tx for
10.21.10.120@o2ib: 4479172 seconds
Aug 3 13:43:44 node578 kernel: LNet:
73616:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Skipped 1 previous similar
message
Aug 3 13:44:34 node578 kernel: Lustre:
73626:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed
out for sent delay: [sent 1533296669/real 0] req@ffff88bf97f9fcc0
x1607776977552816/t0(0)
o8->[email protected]@o2ib:28/4 lens 520/544 e 0 to
1 dl 1533296674 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 3 13:44:34 node578 kernel: Lustre:
73626:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 3 previous similar
messages
Aug 3 13:44:59 node578 kernel: LNet:
73616:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Timed out tx for
10.21.10.113@o2ib: 2 seconds
Aug 3 13:44:59 node578 kernel: LNet:
73616:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Skipped 1 previous similar
message
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org