On 11/08/2016 16:10, Colin Faber wrote:
First glance indicates you're having network connectivity problems,
(possibly driver issue with your NIC?)
I don't seem to have had any problems with any other services running on
the cluster, and there are no messages in the journal or the /var/log
files relating to network errors.
Oddly though when the /home filesystem hangs the /storage and /scratch
filesystems also served by the same luster servers continue to respond
without problems.
What does semm top have some bearing on it is that the first few writes
seem to succeed and then it will hang, though it was first noticed
through samba, it also appears to also happen logged in to the console
directly.
(Check MTU settings, etc?)
Pasting as quotation as it stops thunderbird from wrapping the text.....
root@test-r710:~# ifconfig
eno1 Link encap:Ethernet HWaddr 00:26:b9:84:c7:8d
inet addr:192.168.1.80 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::226:b9ff:fe84:c78d/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:8516 errors:0 dropped:0 overruns:0 frame:0
TX packets:23199 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:5297958 (5.2 MB) TX bytes:3222616 (3.2 MB)
eno2 Link encap:Ethernet HWaddr 00:26:b9:84:c7:8f
inet addr:192.168.0.80 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::226:b9ff:fe84:c78f/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1374513 errors:0 dropped:0 overruns:0 frame:0
TX packets:168485 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2026863011 (2.0 GB) TX bytes:21861558 (21.8 MB)
eno4 Link encap:Ethernet HWaddr 00:26:b9:84:c7:93
inet addr:137.205.232.159 Bcast:137.205.232.255 Mask:255.255.255.128
inet6 addr: fe80::226:b9ff:fe84:c793/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:11483 errors:0 dropped:0 overruns:0 frame:0
TX packets:10560 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3504764 (3.5 MB) TX bytes:5731764 (5.7 MB)
root@test-r710:~# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 137.205.232.254 0.0.0.0 UG 0 0 0 eno4
137.205.232.128 0.0.0.0 255.255.255.128 U 0 0 0 eno4
192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eno2
192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eno1
Lustre mounts in fstab :> # Lustre mounted
192.168.0.4@tcp0:/storage /storage lustre defaults,_netdev,flock
0 0
192.168.0.4@tcp0:/home /home lustre defaults,_netdev,flock
0 0
192.168.0.4@tcp0:/scratch /scratch lustre defaults,_netdev,flock
0 0
I've also tried compiling the latest source and installing those modules
: Lustre: Build Version: 2.8.56_26_g6fad3ab this does seem not to have
the problem with matlab (mentioned about a month or so ago), but still
has the hanging problem.
The lustre startup logs in the joural are here :
Aug 12 12:57:10 test-r710 kernel: Lustre: Lustre: Build Version:
2.8.56_26_g6fad3ab
Aug 12 12:57:10 test-r710 kernel: Lustre: Server MGS version (2.1.0.0) is much
older than client. Consider upgrading server (2.8.56_26_g6fad3ab)
Aug 12 12:57:10 test-r710 kernel: Lustre: Trying to mount a client with IR
setting not compatible with current mgc. Force to use current mgc setting that
is IR disabled.
Aug 12 12:57:10 test-r710 kernel: Lustre: Mounted home-client
Cheers.
Phill.
Cheers.
Phill.
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org