Re: [Ocfs2-users] ocfs2 performance and scaling

2008-07-22 Thread Sabuj Pattanayek
Hi,

 Try it out. If not, then we have a bottleneck somewhere.

 One obvious bottleneck is the global bitmap. The fs works around this by
 using a node local bitmap cache called localalloc. By default it is 8MB.
 So if you are using a 4K/4K (block/cluster), then you will hit the global
 bitmap (and thus cluster lock) every 2048 extents. If that is a bottleneck,
 you can mount with a larger localalloc.

 To mount with 16MB localalloc, do:
 mount -olocalalloc=16

I've given up on using volumes 16TB for now and will just settle for
a 15TB volume and a 10TB volume created with 4k/4k block/cluster sizes
and -T mail. However, the performance is exactly halved when I start a
dd on both nodes. I know it's not maxing out the storage speed which
should be around 200MB/s when I used XFS. The mount on both orca and
porpoise:

/dev/mapper/vg-ocfs2_0 on /export/ocfs2_0 type ocfs2
(rw,_netdev,localalloc=16,data=writeback,heartbeat=local)

When the tests are run individually:

orca tmp # time dd if=/dev/zero of=testFile.orca bs=4k count=50
50+0 records in
50+0 records out
204800 bytes (2.0 GB) copied, 11.3476 s, 180 MB/s

porpoise tmp # time dd if=/dev/zero of=testFile.porpoise bs=4k count=50
50+0 records in
50+0 records out
204800 bytes (2.0 GB) copied, 12.6702 s, 162 MB/s

Now when I run them almost simultaneously:

orca tmp # time dd if=/dev/zero of=testFile.orca bs=4k count=50
50+0 records in
50+0 records out
204800 bytes (2.0 GB) copied, 23.9214 s, 85.6 MB/s

porpoise tmp # time dd if=/dev/zero of=testFile.porpoise bs=4k count=50
50+0 records in
50+0 records out
204800 bytes (2.0 GB) copied, 25.319 s, 80.9 MB/s

I couldn't stripe the LVM with lvcreate -i 3 because two of the three
physical volumes are smaller than one of them. I'll get these sizes
matched, make sure I'm still below 16T for the LVM, re-create the FS,
and try the test again.

Thanks,
Sabuj

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] ocfs2 performance and scaling

2008-07-17 Thread Sunil Mushran
Sabuj Pattanayek wrote:
 Hi,

 I'm using OCFS2 from 2.6.26 with some patches I made that allow for
 the creation of a volume greater than 16TB:

 http://oss.oracle.com/pipermail/ocfs2-devel/2008-July/002568.html
 http://oss.oracle.com/pipermail/ocfs2-tools-devel/2008-July/000857.html

 The ocfs2-tools-devel post has info regarding the block/cluster size
 (from the mkfs command) used which will pertain to the following
 question: in general, what sort of performance numbers are people
 seeing for something like time dd if=/dev/zero of=testFile bs=4k
 count=50? I'm getting anywhere from 120MB/s to 165MB/s . The same
 command on XFS using the same hardware/LVM setup gives me 300MB/s and
 with GFS2 gives 100MB/s. Currently there's only one node in the
 cluster but if other nodes are added with similar 4GB FC HBA hardware
 will these also achieve ~120-165MB/s write speeds as long as the RAID
 hardware isn't being maxed out?
   

Try it out. If not, then we have a bottleneck somewhere.

One obvious bottleneck is the global bitmap. The fs works around this by
using a node local bitmap cache called localalloc. By default it is 8MB.
So if you are using a 4K/4K (block/cluster), then you will hit the global
bitmap (and thus cluster lock) every 2048 extents. If that is a bottleneck,
you can mount with a larger localalloc.

To mount with 16MB localalloc, do:
mount -olocalalloc=16

XFS has delayed allocation that allows it to write data in fewer extents
allowing it to provide better i/o thruput in buffered access.

 Here are some bonnie++ benchmarks:

 http://structbio.vanderbilt.edu/~pattans/bonnie-porpoise.html

 Also if any devs could look at the patches to see if I missed anything
 that might cause OCFS2 to blow up if it reaches for a block offset
 greater than 2^32 - 1, would greatly appreciate it (please post in
 reply to the posts on the -devel lists). As far as the write testing
 is going, it's only at 1.1T of 18T written, i.e. it'll take a day or
 two and then I'll have to try some fseek and read calls for large
 offsets.
   

So JBD2 will allow one to go beyond 4 billion blocks. But to make ocfs2
access beyond 16T, you will for the time being need to use clustersize  4K.

To make ocfs2 with 4K clustersize access beyond 16T will need more changes.
See task titled... Support more than 32-bits worth of clusters.
http://oss.oracle.com/osswiki/OCFS2/LargeTasksList

A quick way to fill up space could be using unwritten extents. It will just
allocate space and not bother writing to it. Check out 
reserve_space/reserve_space.c
in the ocfs2-test project.

As far as the kernel patches go, we would like backward compatibility.
As in, not get rid of jbd just yet. Maybe an incompat flag. But this has not
been decided.

Let us know how it goes.

Sunil

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users