Re: NFS Crash crashes all hosts

Trevor Francis Sun, 25 Nov 2012 11:46:03 -0800

Iozone tests. I was testing with multiple threads.

iozone -l 32 -O -i 0 -i 1 -i 2 -e -+n -r 4K -s 4G > test.txt

This crashed the NFS store.

I backed it back to 4 processes and it ran fine. I also did a standard iozone -a for automatic tests.

IP Traffic Throughput peaked at around 800Mb/sec from the VM and didnt crash the NFS store. I am trying to figure out how the previous test caused issues with the NFS store. In actuality, the store never crashed....but lost IP connectivity causing the hosts to think it was dead. This is strange because I am running link bonding across multiple trunked switch. So, I should be able to pull any network cable out of my setup and not cause an issue....

I am running an NFS store to the hosts. Thoughts?

Trevor Francis
Partner
46 Labs | PeerEdge Cloud Switch (PeCS)
http://www.46labs.com | http://www.peeredge.net
720-214-3643- Voice
tre...@46labs.com

Solutions Provider for the Telecom Industry

On Nov 25, 2012, at 12:04 PM, Ahmad Emneina <ahmad.emne...@citrix.com> wrote:

What tests were you running and what kind of throughput were you seeing? Vm speed throttling is probably happening for vm to vm or vm to Internet traffic, not a QoS limit on its storage throughput. That would probably have to be enforced on the hypervisor manually, I don't think cloudstack has that feature yet.

Ahmad

On Nov 25, 2012, at 9:50 AM, "Trevor Francis" <trevor.fran...@tgrahamcapital.com<mailto:trevor.fran...@tgrahamcapital.com>> wrote:

why would a high-io vm cause this.

The hosts run bonded GigE for Storage/Management and the storage server runs Quad-bonded GigE. There shouldnt be a scenario where a VM can take out the storage server or even a host for that matter......Also, VM speed is limited to 1000Mb/sec.

Thoughts?

Trevor Francis
Partner
46 Labs | PeerEdge Cloud Switch (PeCS)
http://www.46labs.com | http://www.peeredge.net
720-214-3643- Voice
tre...@46labs.com<mailto:tre...@46labs.com>

Solutions Provider for the Telecom Industry

<image001.jpg> <image002.jpg>

On Nov 25, 2012, at 11:42 AM, Ahmad Emneina <ahmad.emne...@citrix.com<mailto:ahmad.emne...@citrix.com>> wrote:

This is expected behavior to prevent disk corruption, during a host communication outage.

Excerpt from [1]:
'The worst-case scenario for HA is the situation where a host is thought to be off-line but is actually still writing to the shared storage, because this can result in corruption of persistent data. To prevent this situation without requiring active power strip controls, XenServer employs hypervisor-level fencing. This is a Xen modification which hard-powers off the host at a very low-level if it does not hear regularly from a watchdog process running in the control domain. Because it is implemented at a very low-level, this also protects the storage in the case where the control domain becomes unresponsive for some reason.'

[1] http://support.citrix.com/servlet/KbServlet/download/21018-102-664364/High%20Availability%20for%20Citrix%20XenServer.pdf

Ahmad

On Nov 25, 2012, at 7:51 AM, "Trevor Francis" <trevor.fran...@tgrahamcapital.com<mailto:trevor.fran...@tgrahamcapital.com><mailto:trevor.fran...@tgrahamcapital.com>> wrote:

We performed an IOZONE test through one of our VMs to benchmark our NFS store. It saturated the link, causing the NFS server to stop responding. (according to the logs on the hosts)

This caused every one of our hosts (Running XS 6.02) to reboot itself.

Nov 25 09:13:24 compute0 heartbeat: Problem with /var/run/sr-mount/6b407ac5-aca7-1ade-de4e-765a728d6f52/hb-365a44b3-8083-4b3e-a748-498f3f9b0017
Nov 25 09:13:24 compute0 kernel: nfs: server 172.16.0.5 not responding, timed out
Nov 25 09:15:56 compute0 syslogd 1.4.1: restart.

We are running standard NFS on a linux server. The server reported no errors.

We are running CS4.

Why would this happen?

Trevor Francis
Partner
46 Labs | PeerEdge Cloud Switch (PeCS)
http://www.46labs.com | http://www.peeredge.net
720-214-3643- Voice
tre...@46labs.com<mailto:tre...@46labs.com><mailto:tre...@46labs.com>

Solutions Provider for the Telecom Industry

<image001.jpg> <image002.jpg>

Re: NFS Crash crashes all hosts

Reply via email to