Dear list ,
I am doing pressure test for a new 10-OSS Lustre file system
using 70 client node. (each server has 10Gb Ethernet connection, each client
has 1Gb Ethernet connection, there are 3 OST on 3 RAID6 volulme for one OSS)
Each time, after about 4 hours, clients began to be frozen one
after another. command "lfs check osts" shows that the frozen clients cannot
access some OSTs.
error: check 'testfs-OST0007-osc-c9b82800': Resource
temporarily unavailable (11)
error: check 'testfs-OST0008-osc-c9b82800': Resource
temporarily unavailable (11)
error: check 'testfs-OST0009-osc-c9b82800': Resource
temporarily unavailable (11)
and command "lctl ping server" , shows "Input/Out put error"
However, the servers are not so busy( util% <10) when
clients are frozen. My question is:
1.Why clients cannot reconnect when servers are not so
busy?
2. I am setting timeout=1000, do I need add timeout to
a number larger?
3.Is there any other variable needed to be tuned under
heavy pressure?
each server has 10Gb Ethernet connection, each client has 1Gb Ethernet
connection.
Best Regards
Lu Wang
--------------------------------------------------------------
Computing Center
IHEP
Beijing 100049,China Email: [email protected]
--------------------------------------------------------------
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss