On 4/26/2010 9:29 AM, Tim Clewlow wrote:
Hi there,

I'm getting ready to build a RAID 6 with 4 x 2TB drives to start,
but the intention is to add more drives as storage requirements
increase.

My research/googling suggests ext3 supports 16TB volumes if block
size is 4096 bytes, but some sites suggest the 32 bit arch means it
is restricted to 4TB no matter what block size I use. So, does ext3
(and relevent utilities, particularly resize2fs and e2fsck) on 32
bit i386 arch support 16TB volumes?

I intend to use mdadm to build / run the array. If an unrecoverable
read error (bad block that on disk circuitry cant resolve) is
discovered on a disk then how does mdadm handle this? It appears the
possibilities are:
1) the disk gets marked as failed in the array - ext3 does not get
notified of a bad block
2) mdadm uses free space to construct a new stripe (from remaining
raid data) to replace the bad one - ext3 does not get notified of a
bad block
3) mdadm passes the requested data (again reconstructed from
remaining good blocks) up to ext3 and then tells ext3 that all those
blocks (from the single stripe) are now bad, and you deal with it
(ext3 can mark and reallocate storage location if it is told of bad
blocks too).

I would really like to hear it is either 2 or 3 as I would prefer
not to have an entire disk immediately marked bad due to one
unrecoverable read error - I would prefer to be notified instead so
I can still have RAID 6 protecting "most" of the data until the disk
gets replaced.

Regards, Tim.



I'm afraid that opinions of RAID vary widely on this list (no surprise) but you may be interested to note that we agree (a consensus) that software-RAID 6 is an unfortunate choice.

I believe that the answer to your question is none of the above. The closest is (2.). As I'm sure you know, RAID 6 uses block-level striping. So, what happens is a matter of policy, but I believe that data that is believed lost is recovered from parity, and rewritten to the array.[0] The error is logged, and the status of the drive is changed. If the drive doesn't fail outright, depending on policy[1], the drive may be re-verified or dropped out. However, mdadm handles the error, because it is a lower level failure than ext3.

The problem is when the drive is completely 100% in use (no spare capacity). In that case, no new stripe is created, because there is no room to put one. The data is moved to unused area[1], and the status of the drive is changed. (your scenario 1.) ext3 is still unaware.

The file system is a logical layer on top of RAID, and will only become aware of changes to the disk structure when it is unavoidable. RAID guarantees a certain capacity. If you create a volume with 1 TB capacity, the volume will always have that capacity.

If you set this up, be sure to also combine it with LVM2. Then you have much greater flexibility about what to do when recovering from failures.


[0] This depends on the implementation, and I don't know what mdadm does. Some implementations might do this automatically, but I think most would require a rebuild.

[1] Again, I forget what mdadm does in this case.  Anybody?



I'm sorry, I seem to have avoided answering a crucial part of your question. I think that the md device documentation is what you want.


MAA






--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4bd5b2a0.7060...@allums.com

Reply via email to