Re: [Ocfs2-users] ocfs2 performance and scaling
Hi, Try it out. If not, then we have a bottleneck somewhere. One obvious bottleneck is the global bitmap. The fs works around this by using a node local bitmap cache called localalloc. By default it is 8MB. So if you are using a 4K/4K (block/cluster), then you will hit the global bitmap (and thus cluster lock) every 2048 extents. If that is a bottleneck, you can mount with a larger localalloc. To mount with 16MB localalloc, do: mount -olocalalloc=16 I've given up on using volumes 16TB for now and will just settle for a 15TB volume and a 10TB volume created with 4k/4k block/cluster sizes and -T mail. However, the performance is exactly halved when I start a dd on both nodes. I know it's not maxing out the storage speed which should be around 200MB/s when I used XFS. The mount on both orca and porpoise: /dev/mapper/vg-ocfs2_0 on /export/ocfs2_0 type ocfs2 (rw,_netdev,localalloc=16,data=writeback,heartbeat=local) When the tests are run individually: orca tmp # time dd if=/dev/zero of=testFile.orca bs=4k count=50 50+0 records in 50+0 records out 204800 bytes (2.0 GB) copied, 11.3476 s, 180 MB/s porpoise tmp # time dd if=/dev/zero of=testFile.porpoise bs=4k count=50 50+0 records in 50+0 records out 204800 bytes (2.0 GB) copied, 12.6702 s, 162 MB/s Now when I run them almost simultaneously: orca tmp # time dd if=/dev/zero of=testFile.orca bs=4k count=50 50+0 records in 50+0 records out 204800 bytes (2.0 GB) copied, 23.9214 s, 85.6 MB/s porpoise tmp # time dd if=/dev/zero of=testFile.porpoise bs=4k count=50 50+0 records in 50+0 records out 204800 bytes (2.0 GB) copied, 25.319 s, 80.9 MB/s I couldn't stripe the LVM with lvcreate -i 3 because two of the three physical volumes are smaller than one of them. I'll get these sizes matched, make sure I'm still below 16T for the LVM, re-create the FS, and try the test again. Thanks, Sabuj ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] ocfs2 performance and scaling
Sabuj Pattanayek wrote: Hi, I'm using OCFS2 from 2.6.26 with some patches I made that allow for the creation of a volume greater than 16TB: http://oss.oracle.com/pipermail/ocfs2-devel/2008-July/002568.html http://oss.oracle.com/pipermail/ocfs2-tools-devel/2008-July/000857.html The ocfs2-tools-devel post has info regarding the block/cluster size (from the mkfs command) used which will pertain to the following question: in general, what sort of performance numbers are people seeing for something like time dd if=/dev/zero of=testFile bs=4k count=50? I'm getting anywhere from 120MB/s to 165MB/s . The same command on XFS using the same hardware/LVM setup gives me 300MB/s and with GFS2 gives 100MB/s. Currently there's only one node in the cluster but if other nodes are added with similar 4GB FC HBA hardware will these also achieve ~120-165MB/s write speeds as long as the RAID hardware isn't being maxed out? Try it out. If not, then we have a bottleneck somewhere. One obvious bottleneck is the global bitmap. The fs works around this by using a node local bitmap cache called localalloc. By default it is 8MB. So if you are using a 4K/4K (block/cluster), then you will hit the global bitmap (and thus cluster lock) every 2048 extents. If that is a bottleneck, you can mount with a larger localalloc. To mount with 16MB localalloc, do: mount -olocalalloc=16 XFS has delayed allocation that allows it to write data in fewer extents allowing it to provide better i/o thruput in buffered access. Here are some bonnie++ benchmarks: http://structbio.vanderbilt.edu/~pattans/bonnie-porpoise.html Also if any devs could look at the patches to see if I missed anything that might cause OCFS2 to blow up if it reaches for a block offset greater than 2^32 - 1, would greatly appreciate it (please post in reply to the posts on the -devel lists). As far as the write testing is going, it's only at 1.1T of 18T written, i.e. it'll take a day or two and then I'll have to try some fseek and read calls for large offsets. So JBD2 will allow one to go beyond 4 billion blocks. But to make ocfs2 access beyond 16T, you will for the time being need to use clustersize 4K. To make ocfs2 with 4K clustersize access beyond 16T will need more changes. See task titled... Support more than 32-bits worth of clusters. http://oss.oracle.com/osswiki/OCFS2/LargeTasksList A quick way to fill up space could be using unwritten extents. It will just allocate space and not bother writing to it. Check out reserve_space/reserve_space.c in the ocfs2-test project. As far as the kernel patches go, we would like backward compatibility. As in, not get rid of jbd just yet. Maybe an incompat flag. But this has not been decided. Let us know how it goes. Sunil ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users