Re: md RAID

Steve Holdoway Thu, 08 Apr 2010 14:34:39 -0700

On Fri, 2010-04-09 at 09:16 +1200, Solor Vox wrote:
> Hey all,
> 
> I'm going to warn you beforehand and say that this message is
> technical and academic discussions of the inner-workings of md-RAID
> and file systems.  If you haven't had your morning coffee or don't
> want a headache, please stop reading now. :)
> 
> If you're still here, I've been trying to work out the optimal chunk
> size, stripe width, and stride for a 6TB RAID-5 array I'm building.
> 
> For hardware, I've got 4x1.5TB Samsung SATA2 drives.  I'm going to use
> Linux md in RAID-5 configuration.  Primary use for this box is HD
> video and DVD  storage.  So for argument's sake, lets say that of the
> usable 4.5TB, 4TB is for large 8GB and up files.  I also plan on
> either ext4 or xfs.
> 
> One last thing to get out of the way is meaning of all the block
> sizes.  Unfortunately, people tend to use “block size” to mean many
> different things.  So to prevent this, I'm going to use the following.
> 
> Stride – number of bytes written to disk before moving to next in array.
> Stripe width – stride size * data disks in array, so 3 in my case.
> Chunk size – File system “block size” or bytes per inode.
> Page size – Linux kernel cache page size, almost always 4KB on x86 hardware
> 
> Now comes the fun part, picking the correct values for creating the
> array and file-system.  The arguments for this are very academic and
> very specific for intended use.  Typically most people try for
> “position” optimization by picking a FS chunk size that matches the
> RAID stripe width.  By matching the array, you reduce the number of
> read/writes to access each file.  While this works in theory, you
> can't ensure that the stripe is written perfectly across the array.
> And unless your chunk size matches your page size, the operation isn't
> atomic anyway.
> 
> The other method is “transfer” optimization where you make the FS
> chunk sizes smaller ensuring that files are broken up across the
> array.  The theory here is that using more then one drive at a time to
> read the file will increase transfer performance.  This however
> increases the number of read/write operations needed for the same size
> file with larger chunks.
> 
> Things get even more fun when LVM is thrown into the mix.  As LVM will
> create a physical volume that contains logical volumes.  The FS is
> then put on the LV so trying to align the FS to the array no longer
> makes sense.  You can set the metasize for PV so it is aligned with
> the array.  So the assumption here is that the FS should be aligned
> with the PV.
> 
> While this all may seem like a bit much, getting it right can mean an
> extra 30-50MB/s or more from the array.  So, has anyone done this type
> of optimization?  I'd really rather not spend a week(s) testing
> different values as 6TB arrays can take several hours to build.
> 
> Cheers,
> sV
> 
Just to throw a bit more into the mix...


1. I wouldn't touch ext4 for this.
2. What about reiser4?
3. PARTITIONING! Having just lived through it, watch out for the newer
( WD only?? ) disks with 4kB sectors but don't report it. That brought
throughput down to < 1MB/sec.
4. If your primary intention is performance ( rather than getting the
best of all worlds ), why not RAID10? IMO disks are too cheap to worry
with RAID5. ( 1.5TB is certainly the sweet spot pricewise, but most
mobos have 6 SATA slots )

I would certainly do some basic testing, as the best answer will depend
on the hardware you choose, and the mix of sizes of the files you wish
to serve. I have had poor performance from some mobos and SATA ( ATI
Technologies Inc SB700/SB800 SATA Controller as an example ) drivers, so
some research is a good idea.

Enjoy your weekend!

Steve



-- 
Steve Holdoway <[email protected]>
http://www.greengecko.co.nz
MSN: [email protected]
GPG Fingerprint = B337 828D 03E1 4F11 CB90  853C C8AB AF04 EF68 52E0

Re: md RAID

Reply via email to