Re: Questions about RAID 6

Mark Allums Mon, 26 Apr 2010 08:35:31 -0700

On 4/26/2010 9:29 AM, Tim Clewlow wrote:

Hi there,


I'm getting ready to build a RAID 6 with 4 x 2TB drives to start,
but the intention is to add more drives as storage requirements
increase.

My research/googling suggests ext3 supports 16TB volumes if block
size is 4096 bytes, but some sites suggest the 32 bit arch means it
is restricted to 4TB no matter what block size I use. So, does ext3
(and relevent utilities, particularly resize2fs and e2fsck) on 32
bit i386 arch support 16TB volumes?

I intend to use mdadm to build / run the array. If an unrecoverable
read error (bad block that on disk circuitry cant resolve) is
discovered on a disk then how does mdadm handle this? It appears the
possibilities are:
1) the disk gets marked as failed in the array - ext3 does not get
notified of a bad block
2) mdadm uses free space to construct a new stripe (from remaining
raid data) to replace the bad one - ext3 does not get notified of a
bad block
3) mdadm passes the requested data (again reconstructed from
remaining good blocks) up to ext3 and then tells ext3 that all those
blocks (from the single stripe) are now bad, and you deal with it
(ext3 can mark and reallocate storage location if it is told of bad
blocks too).

I would really like to hear it is either 2 or 3 as I would prefer
not to have an entire disk immediately marked bad due to one
unrecoverable read error - I would prefer to be notified instead so
I can still have RAID 6 protecting "most" of the data until the disk
gets replaced.

Regards, Tim.

I'm afraid that opinions of RAID vary widely on this list (no surprise)but you may be interested to note that we agree (a consensus) thatsoftware-RAID 6 is an unfortunate choice.

I believe that the answer to your question is none of the above. Theclosest is (2.). As I'm sure you know, RAID 6 uses block-levelstriping. So, what happens is a matter of policy, but I believe thatdata that is believed lost is recovered from parity, and rewritten tothe array.[0] The error is logged, and the status of the drive ischanged. If the drive doesn't fail outright, depending on policy[1],the drive may be re-verified or dropped out. However, mdadm handles theerror, because it is a lower level failure than ext3.

The problem is when the drive is completely 100% in use (no sparecapacity). In that case, no new stripe is created, because there is noroom to put one. The data is moved to unused area[1], and the status ofthe drive is changed. (your scenario 1.) ext3 is still unaware.

The file system is a logical layer on top of RAID, and will only becomeaware of changes to the disk structure when it is unavoidable. RAIDguarantees a certain capacity. If you create a volume with 1 TBcapacity, the volume will always have that capacity.

If you set this up, be sure to also combine it with LVM2. Then you havemuch greater flexibility about what to do when recovering from failures.

[0] This depends on the implementation, and I don't know what mdadmdoes. Some implementations might do this automatically, but I thinkmost would require a rebuild.


[1] Again, I forget what mdadm does in this case.  Anybody?

I'm sorry, I seem to have avoided answering a crucial part of yourquestion. I think that the md device documentation is what you want.



MAA






--

To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.orgwith a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Archive: http://lists.debian.org/4bd5b2a0.7060...@allums.com

Re: Questions about RAID 6

Reply via email to