Dear list , 
                I am doing pressure test for a new 10-OSS Lustre file system 
using 70 client node. (each server has 10Gb Ethernet connection, each client 
has 1Gb Ethernet connection, there are 3 OST on 3 RAID6 volulme for one OSS)
                Each time, after about 4 hours, clients began to be frozen one 
after another. command "lfs check osts" shows that the frozen clients cannot 
access some OSTs. 
                error: check 'testfs-OST0007-osc-c9b82800': Resource 
temporarily unavailable (11)
                error: check 'testfs-OST0008-osc-c9b82800': Resource 
temporarily unavailable (11)
                error: check 'testfs-OST0009-osc-c9b82800': Resource 
temporarily unavailable (11)

and  command "lctl ping server" , shows "Input/Out put error"
                                
                   However, the servers are not so busy( util% <10)  when 
clients are frozen. My question is:
                        1.Why  clients cannot reconnect when servers are not so 
busy? 
                        2. I am setting timeout=1000, do I need add timeout to 
a number larger?
                        3.Is there any other  variable needed to be tuned under 
heavy pressure? 
each server has 10Gb Ethernet connection, each client has 1Gb Ethernet 
connection. 
            




Best Regards
Lu Wang
--------------------------------------------------------------    
Computing Center
IHEP                                            
Beijing 100049,China            Email: [email protected]                       
                                
--------------------------------------------------------------                  
                
                          



_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to