I'm setting up a web server with Raid-1, using raidtools 0.90-5
and linux kernel 2.2.12 (this is the Redhat 6.1 distr). I want to
mirror all my data across two disks (hda and hdc).
The problem I've noticed from testing is that if I shut off the power
and then reboot, the raidtools software will start re-syncing the
mirrors,
even though there was no write activity at all when the power went off
and even
though both parts of the mirror have the exact same event counter.
The problem I see with this is as follows:
- Assume a power outage hits and wipes out some sectors on the
hda disk, but leaves the superblock alone. I think this scenario
is a fairly likely one.
- After the power outage, the system boots up and starts up a
resync,
copying data from hda to hdc
- The system tries to access the bad sectors on hda
What would happen at this point? I assume the data would be lost,
since hdc is undergoing a re-sync, and the sectors on hda are already
bad.
Even though at boot time hdc contained good copies of these sectors,
the raid software starting re-syncing onto hdc and lost that data. If
however
the raid code had just left hdc alone it could've recovered these
sectors.
I looked at the raidtools code, and it looks to me what is happening is
that
there is a SB_CLEAN flag in the superblock that is set to false when
raid
is started on an md device. This SB_CLEAN flag is only set to true if a
clean
shutdown is performed. So if a power outage hits, this flag is always
going
to be false since no clean shutdown is performed. At boot time the md
code
then checks the SB_CLEAN flag and if it is false a resync is performed.
It seems to me that a resync should only be required if the system is in
the
middle of a write where some data has been sent to one disk, but not yet
to another.
I think the event counter already performs this function so I don't see
why the
SB_CLEAN flag is even needed.
What do you think? Could this SB_CLEAN flag be eliminated to reduce the
risk of a resync damaging good data?