The cluster stack uses the interconnect to negotiate the locks.
That's how it is able to provide data coherency. Other solutions
do not provide that kind of coherency.

If you are referring to interconnect speeds in ms, it is not good.
That unit is typically used for disk access.

On 08/19/2011 01:30 PM, Nick Geron wrote:

Actually those first numbers were from GigE links going out to physical 
switches and back in.  To optimize the private link, I upgraded the VMs NICs to 
10GE (VMXNet3 which is the VMware para virt driver), moved them onto the same 
host system with a dedicated software switch between them.  The numbers only 
improved slightly, and got worse on 1 of 100 pings (1ms).

10GE between VMs under the same hypervisor: rtt min/avg/max/mdev = 
0.194/0.307/1.003/0.132 ms

What I don't understand is why my OCFS2 cluster suffers so greatly.  There's 
quite a big difference between wall time of 0.17 seconds to traverse the data 
on an iSCSI link and the 4 minutes to do the same on OCFS2 with a sub 1ms 
average latent private interconnect.  For that matter, the whole setup is 
running on another clustered FS (VMFS3) over the same network to the same SAN.  
I guess I'm just a little dumbfounded that OCFS2 is so much more demanding than 
other clustered FSs and alternative network storage options.

Is the network really the most likely candidate?  If so, is anyone else running 
OCFS2 from within a VM environment?  Is this technology only worthwhile in the 
physical world?  Is there a sweet spot for network latency that I should strive 
for?  The user guide only makes mention of 'low latency' but lacks figures save 
for heartbeat and timeouts.

-nick

*From:*Sunil Mushran [mailto:sunil.mush...@oracle.com]
*Sent:* Friday, August 19, 2011 2:30 PM
*To:* Nick Geron
*Cc:* ocfs2-users@oss.oracle.com
*Subject:* Re: [Ocfs2-users] IO performance appears slow

Somewhat equivalent but it misses the effect of the workload at that time.

BTW, those are awful number for 10G NICs. I get better numbers with gige.
rtt min/avg/max/mdev = 0.149/0.168/0.188/0.020 ms

You should check the config, etc. Use ethtool, etc.

On 08/19/2011 10:54 AM, Nick Geron wrote:

Thanks for the feedback Sunil,

You are correct that the sys and user times were very low.  I did check the 
response and latency between the two nodes thinking that could be an issue.   I 
didn't see an issue there, but then again I do not know what they should be.  
Is there a document that outlines the base and/or recommendations for that 
link?  The best I can do in this environment is break my host redundancy and 
move both nodes to the same VMware vSwitch with 10g NICs.

Average latency between the two: rtt min/avg/max/mdev = 0.207/0.268/0.360/0.046 
ms.

Are the ping stats dumped from o2net not equivalent to a simple ping between 
the hosts?  Is my reported latency too great for OCFS2 to function well?

Thanks for your assistance.

-Nick

*From:*Sunil Mushran [mailto:sunil.mush...@oracle.com]
*Sent:* Thursday, August 18, 2011 10:26 PM
*To:* Nick Geron
*Cc:* ocfs2-users@oss.oracle.com <mailto:ocfs2-users@oss.oracle.com>
*Subject:* Re: [Ocfs2-users] IO performance appears slow

The network interconnect between the vms is slow. What
would have helped is the sys and user times. But my guess
is that that is low. Most of it is spent in wall time.

In mainline, o2net dumps stats showing the ping time between
nodes. Unfortunately this kernel is too old.

On 08/18/2011 04:24 PM, Nick Geron wrote:

Greetings,

I'm rather new to OCFS2, so please forgive any glaringly ignorant statements.

I'm evaluating file systems and storage layout for a simple 2 node mail cluster 
using Maildir email directories.  I have created a 2 node cluster with related 
tutorials.  The problem I'm seeing is that general file access using cp, find, 
du, ls, etc. is a significant factor slower on ocfs2 than alternative local and 
remote disk configurations.  I'm hoping someone can clue me into whether this 
behavior is normal, or if I'm missing something in my lab.

*Hosts are identical CentOS 5.5 virtual machines (VMware) with 
2.6.18-238.19.1.el5. (2 ESXi hosts)

*OCFS2 build is ocfs2-2.6.18-238.19.1.el5-1.4.7-1.el5 (tools v 1.4.4-1.el5).

*SAN is an EMC Clariion.  LUN is accessed via iSCSI with EMC PowerPath 
5.5.0.00.00-275

*Nodes share a gigabit network for their private interconnect via two 
interconnected switches (ESXi host into each).

*Test data is a 181MB Maildir directory (~12K emails) copied to various types 
of storage.

*Tests involve simple bash scripts running (bash) time with the mentioned 
command line utilities and strace inspection.

The OCFS2 file system was created with the following (mount cannot load xattr 
or extended-slotmap added with max-features):

mkfs.ocfs2 -N 2 -T mail --fs-features=backup-super,sparse,unwritten,inline-data 
-v /dev/emcpowera

Mount options are limited to '_netdev' at the moment.  I've read a bit about 
changing 'data' from ordered to writeback, but that seems to be related to 
waits on flushing cache to disk.  So far, I'm just focusing on reads/lstats.

With a maildir in place, any operation that must inspect all files takes quite 
a while to complete without cached entries.  The alarming thing is the 
discrepancy between my OCFS2 data and identical data on local, NFS and iSCSI 
mounts.

Here's some simple data that should illustrate my problem and my confusion:

Command: 'du --hs /path/to/maildir/on/various/mounts

Storage                Real time to complete Min:Sec

----------------------------------------------------------------------

Local disk             0:0.078

NFS                        0:2

iSCSI (EXT3)        0:1.7

iSCSI (OCFS2)     4:24

Other tests including recursive chowns or chmods, and ls report similar results.

Most telling is perhaps strace output.  There I can see system calls on 
individual Maildir files.  Times between each call/operation take far longer on 
OCFS2 and there is no hint of externally derived waits.  Nor are there any 
indicators of load issues from competing processes; nothing else (significant) 
is going on and du has full reign of the OS resources.

Output from strace with --tt --T using du --hs against the Maildir on my EXT3 
iSCSI LUN (/dev/emcpowerb1)

18:03:17.572879 lstat("1313705228.000737.mbox:2,S", {st_mode=S_IFREG|0644, 
st_size=715, ...}) = 0 <0.000018>

18:03:17.572944 lstat("1313705228.008426.mbox:2,S", {st_mode=S_IFREG|0644, 
st_size=2779, ...}) = 0 <0.000024>

18:03:17.573016 lstat("1313705228.006345.mbox:2,S", {st_mode=S_IFREG|0644, 
st_size=2703, ...}) = 0 <0.000020>

18:03:17.573083 lstat("1313705228.001305.mbox:2,S", {st_mode=S_IFREG|0644, 
st_size=1831, ...}) = 0 <0.000017>

Output from the same trace against the OCFS2 store

18:06:52.876713 lstat("1313707554.003441.mbox:2,S", {st_mode=S_IFREG|0644, 
st_size=2322, ...}) = 0 <0.040896>

18:06:52.917723 lstat("1313707554.003442.mbox:2,S", {st_mode=S_IFREG|0644, 
st_size=2316, ...}) = 0 <0.040663>

18:06:52.958473 lstat("1313707554.003443.mbox:2,S", {st_mode=S_IFREG|0644, 
st_size=2899, ...}) = 0 <0.000938>

18:06:52.959471 lstat("1313707554.003444.mbox:2,S", {st_mode=S_IFREG|0644, 
st_size=2522, ...}) = 0 <0.001106>

18:06:52.960641 lstat("1313707554.003445.mbox:2,S", {st_mode=S_IFREG|0644, 
st_size=3451, ...}) = 0 <0.039904>

18:06:53.000644 lstat("1313707554.003446.mbox:2,S", {st_mode=S_IFREG|0644, 
st_size=3150, ...}) = 0 <0.041060>

Is this normal behavior for a current kernel and the most recent 1.4.7 code?  
Does someone suspect I've blundered somewhere along the way?  I've seen many 
posts to this list related to a mail cluster setup like mine.  Is anyone on the 
list running a production mail cluster with OCFS2?  I apologize for the length 
of this email.  Thanks.

-Nick Geron

_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com  <mailto:Ocfs2-users@oss.oracle.com>
http://oss.oracle.com/mailman/listinfo/ocfs2-users


_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to