On Sunday, May 3, 2020 6:27 PM, Jack <[email protected]> wrote:
> Minor point - you have one duplicate line there ". f f ." which is the
> second and last line of the second group. No effect on anything else in
> the discussion.
thanks.
> Trying to help thinking about odd numbers of disks, if you are still
> allowing only one disk to fail, then you can think about mirroring half
> disks, so each disk has half of it mirrored to a different disk, instead
> of drives always being mirrored in pairs.
that definitely helped get me unstuck and continue
thinking. thanks.
curious. how do people look at --layout=n2 in the
storage industry? e.g. do they ignore the
optimistic case where 2 disk failures can be
recovered, and only assume that it protects for 1
disk failure?
i see why gambling is not worth it here, but at
the same time, i see no reason to ignore reality
(that a 2 disk failure can be saved).
e.g. a 4-disk RAID10 with -layout=n2 gives
1*4/10 + 2*4/10 = 1.2
expected recoverable disk failures. details are
below:
F . . . < recoverable
. F . . < cases with
. . F . < 1 disk
. . . F < failure
F . . F < recoverable
. F F . < cases with
. F . F < 2 disk
F . F . < failures
F F . . < not recoverable
. . F F < cases with 2 disk
< failures
now, if we do a 5-disk --layout=n2, we get:
1 (1) 2 (2) 3
(3) 4 (4) 5 (5)
6 (6) 7 (7) 8
(8) 9 (9) 10 (10)
11 (11) 12 (12) 13
(13) ...
obviously, there are 5 possible ways a single disk
may fail, out of which all of the 5 will be
recovered.
there are nchoosek(5,2) = 10 possible ways a 2
disk failure could happen, out of which 5
will be recovered:
xxx (1) xxx (2) 3
xxx 4 xxx 5 (5)
xxx (1) 2 xxx 3
xxx 4 (4) xxx (5)
1 xxx 2 xxx 3
(3) xxx (4) xxx (5)
1 xxx 2 (2) xxx
(3) xxx (4) 5 xxx
1 (1) xxx (2) xxx
(3) 4 xxx 5 xxx
so, expected recoverable disk failures for a
5-disk RAID10 --layout=n2 is:
1*5/15 + 2*5/15 = 1
so, by transforming a 4-disk RAID10 into a 5-disk
one, we increase total storage capacity by a 0.5
disk's worth of storage, while losing the ability
to recover 0.2 disks.
but if we extended the 4-disk RAID10 into a
6-disk --layout=n2, we will have:
6 nchoosek(6,2) - 3
= 1 * ----------------- + 2 * -----------------
6 + nchoosek(6,2) 6 + nchoosek(6,2)
= 6/21 + 2 * 12/15
= 1.8857 expected recoverable failing disks.
almost 2. i.e. there is 80% chance of surviving a
2 disk failure.
so, i wonder, is it a bad decision to go with an
even number disks with a RAID10? what is the
right way to think to find an answer to this
question?
i guess the ultimate answer needs knowledge of
these:
* F1: probability of having 1 disks fail within
the repair window.
* F2: probability of having 2 disks fail within
the repair window.
* F3: probability of having 3 disks fail within
. the repair window.
.
.
* Fn: probability of having n disks fail within
the repair window.
* R1: probability of surviving 1 disks failure.
equals 1 with all related cases.
* R2: probability of surviving 2 disks failure.
equals 1/3 with 5-disk RAID10
equals 0.8 with a 6-disk RAID10.
* R3: probability of surviving 3 disks failure.
equals 0 with all related cases.
.
.
.
* Rn: probability of surviving n disks failure.
equals 0 with all related cases.
* L : expected cost of losing data on an array.
* D : price of a disk.
this way, the absolute expected cost when adopting
a 6-disk RAID10 is:
= 6D + F1*(1-R1)*L + F2*(1-R2)*L + F3*(1-R3)*L + ...
= 6D + F1*(1-1)*L + F2*(1-0.8)*L + F3*(1-0)*L + ...
= 6D + 0 + F2*(0.2)*L + F3*(1-0)*L + ...
and the absolute cost for a 5-disk RAID10 is:
= 5D + F1*(1-1)*L + F2*(1-0.3333)*L + F3*(1-0)*L + ...
= 5D + 0 + F2*(0.6667)*L + F3*(1-0)*L + ...
canceling identical terms, the difference cost is:
6-disk ===> 6D + 0.2*F2*L
5-disk ===> 5D + 0.6667*F2*L
from here [1] we know that a 1TB disk costs
$35.85, so:
6-disk ===> 6*35.85 + 0.2*F2*L
5-disk ===> 5*35.85 + 0.6667*F2*L
now, at which point is a 5-disk array a better
economical decision than a 6-disk one? for
simplicity, let LOL = F2*L:
5*35.85 + 0.6667 * LOL < 6*35.85 + 0.2 * LOL
0.6667*LOL - 0.2 * LOL < 6*35.85 - 5*35.85
LOL * (0.6667 - 0.2) < 6*35.85 - 5*35.85
6*35.85 - 5*35.85
LOL < -----------------
0.6667 - 0.2
LOL < 76.816
F2*L < 76.816
so, a 5-disk RAID10 is better than a 6-disk RAID10
only if:
F2*L < 76.816 bucks.
this site [2] says that 76% of seagate disks fail
per year (:D). and since disks fail independent
of each other mostly, then, the probabilty of
having 2 disks fail in a year is:
F2_year = 0.76*0.76
= 0.5776
but what is F2_week? each year has 52.1429 weeks.
let's be generous and assume that disks fail at a
uniform distribution across the year (e.g. suppose
that we bought them randomlyly, and not in a
single batch).
in this case, the probability of 2 disks failing
in the same week (suppose that our repair window
is 1 week):
52
F2 = 0.5776 * --------------------
52 + nchoosek(52, 2)
= 0.5776 * 0.037736
= 0.021796
let's substitute a bit:
F2 * L < 76.816 bucks.
0.021796 * L < 76.816 bucks.
L < 76.816 / 0.021796 bucks.
L < 3524.3 bucks.
so, in summary:
/------------------------------------------------\
| a 5-disk RAID10 is better than a 6-disk RAID10 |
| ONLY IF your data is WORTH LESS than 3,524.3 |
| bucks. |
\------------------------------------------------/
any thoughts? i'm a newbie. i wonder how
industry people think?
happy quarantine,
cm
------------
[1] https://www.amazon.com/WD-Blue-1TB-Hard-Drive/dp/B0088PUEPK/
[2]
https://www.seagate.com/em/en/support/kb/hard-disk-drive-reliability-and-mtbf-afr-174791en/