On Fri, 2010-04-09 at 09:16 +1200, Solor Vox wrote: > Hey all, > > I'm going to warn you beforehand and say that this message is > technical and academic discussions of the inner-workings of md-RAID > and file systems. If you haven't had your morning coffee or don't > want a headache, please stop reading now. :) > > If you're still here, I've been trying to work out the optimal chunk > size, stripe width, and stride for a 6TB RAID-5 array I'm building. > > For hardware, I've got 4x1.5TB Samsung SATA2 drives. I'm going to use > Linux md in RAID-5 configuration. Primary use for this box is HD > video and DVD storage. So for argument's sake, lets say that of the > usable 4.5TB, 4TB is for large 8GB and up files. I also plan on > either ext4 or xfs. > > One last thing to get out of the way is meaning of all the block > sizes. Unfortunately, people tend to use “block size” to mean many > different things. So to prevent this, I'm going to use the following. > > Stride – number of bytes written to disk before moving to next in array. > Stripe width – stride size * data disks in array, so 3 in my case. > Chunk size – File system “block size” or bytes per inode. > Page size – Linux kernel cache page size, almost always 4KB on x86 hardware > > Now comes the fun part, picking the correct values for creating the > array and file-system. The arguments for this are very academic and > very specific for intended use. Typically most people try for > “position” optimization by picking a FS chunk size that matches the > RAID stripe width. By matching the array, you reduce the number of > read/writes to access each file. While this works in theory, you > can't ensure that the stripe is written perfectly across the array. > And unless your chunk size matches your page size, the operation isn't > atomic anyway. > > The other method is “transfer” optimization where you make the FS > chunk sizes smaller ensuring that files are broken up across the > array. The theory here is that using more then one drive at a time to > read the file will increase transfer performance. This however > increases the number of read/write operations needed for the same size > file with larger chunks. > > Things get even more fun when LVM is thrown into the mix. As LVM will > create a physical volume that contains logical volumes. The FS is > then put on the LV so trying to align the FS to the array no longer > makes sense. You can set the metasize for PV so it is aligned with > the array. So the assumption here is that the FS should be aligned > with the PV. > > While this all may seem like a bit much, getting it right can mean an > extra 30-50MB/s or more from the array. So, has anyone done this type > of optimization? I'd really rather not spend a week(s) testing > different values as 6TB arrays can take several hours to build. > > Cheers, > sV > Just to throw a bit more into the mix...
1. I wouldn't touch ext4 for this. 2. What about reiser4? 3. PARTITIONING! Having just lived through it, watch out for the newer ( WD only?? ) disks with 4kB sectors but don't report it. That brought throughput down to < 1MB/sec. 4. If your primary intention is performance ( rather than getting the best of all worlds ), why not RAID10? IMO disks are too cheap to worry with RAID5. ( 1.5TB is certainly the sweet spot pricewise, but most mobos have 6 SATA slots ) I would certainly do some basic testing, as the best answer will depend on the hardware you choose, and the mix of sizes of the files you wish to serve. I have had poor performance from some mobos and SATA ( ATI Technologies Inc SB700/SB800 SATA Controller as an example ) drivers, so some research is a good idea. Enjoy your weekend! Steve -- Steve Holdoway <[email protected]> http://www.greengecko.co.nz MSN: [email protected] GPG Fingerprint = B337 828D 03E1 4F11 CB90 853C C8AB AF04 EF68 52E0
