Hi All,

 We have 6 nodes running GFS2 under CentOS 5.3 all connecting via Cisco 2960G 
switches to an MD3000i with 8 x 146GB SAS 15K drives. These nodes run a PHP 
website pulling their PHP and images files from a GFS2 volume being exported by 
iSCSI from the MD3000i .

Problem we have is that since inception we've seen issues whereby the HTTPD 
processes will go into a state of 'D', zombied' and the only way we have to 
recover from that is to restart all the nodes in the cluster.

I've tuned the demote_secs down from 300 to 20 seconds on the assumption that 
file locking is causing an issue. Similarly we're running with the following 
GFS values;

        <gfs_controld plock_ownership="1" plock_rate_limit="0"/>

Can anyone give me some pointers on what we should be investigating for why 
this is failing? I've had our networks team crawl over the networking and that 
all seems fine. The MTU is set correctly on the MD3000i and on the individual 
nodes. I've also used the ping_pong tool and on a single file on the GFS 
cluster we can get around 90K locks on a file. If I run ping_pong against the 
same file from two nodes that then drops to around 70 locks per second. I don't 
think that's the issue though.

If anyone can provide some insight to either what to change, what to debug or 
how to investigate this further it'd be greatly appreciated.


Thanks
Gavin

Gavin Conway
Senior Engineer, Operations (Systems Group), UKSolutions

Telephone: 0845 004 1333, option 2
Email: gavin.con...@uksolutions.co.uk
Web: www.uksolutions.co.uk<http://www.uksolutions.co.uk/>
UKS Ltd, Birmingham Road, Studley, Warwickshire, B80 7BG Registered in England 
Number 3036806
This email must be read in conjunction with the legal & service notices on 
http://www.uksolutions.co.uk/disclaimer.html
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Reply via email to