>>> Dimitri Maziuk <[email protected]> schrieb am 30.11.2011 um 15:54 in
Nachricht <[email protected]>:
> On 11/30/2011 2:01 AM, Ulrich Windl wrote:
>>>>> Dimitri Maziuk<[email protected]>  schrieb am 29.11.2011 um 19:36 in
> > Nachricht<[email protected]>:
> >> On 11/29/2011 07:49 AM, Lars Marowsky-Bree wrote:
> >>
> >>> (But the mdadm operations the RA does also shouldn't cause data
> >>> corruption. That strikes me as an MD bug.)
> >>
> >> If you repeatedly try to re-sync with a dying disk, with each resync
> >> interrupted by i/o error, you will get data corruption sooner or later.
> >> It's only MD bug in a sense that MD can't actually stop you from
> >> shooting yourself.
> >
> > I'd like to know more details: Which disk has an I/O error: source
> > or
> destination of the sync. How is data corruption created?
> 
> Well that's the point: if you have 2 disks, and neither has failed yet, 
> how do you pick the one that isn't failing?
> 
> Specific failure mode I'm talking about is busy relocating bad sectors. 
> Until the SMART counter hits the threshold value it's "not failed", but 
> you'll see sata timeouts/resets in /var/log/messages with spiking i/o 
> wait and those "all sorts of hangs" Lars mentioned. If mdadm decides to 
> use that disk as the source, you have a race: either SMART will fail the 
> disk before it starts dropping bits or develops an unrelocatable bad 
> sector, or said bad sectors will get copied to the mirror disk.

Hi!

OK, so the RAID1 is writing to both disks normally when one disk starts to have 
a problem, and that disk will relocate sectors. Asuming that the write to disk 
succeeds the relocation must have been successful. Write performance may be bad 
though.

When reading back such sectors, the read should succeed. However there are 
multiple firmware errors in the wild that make disks return bad data from the 
cache when retrying a read instead of returning the proper data that were read 
successfully.

And in my exerience: Don't wait until SAMRT declared the disk as bad. Drop the 
disk once it starts to relocate more than a sector per week, or if more than 
100 sectore were relocated. I saw disks that were unable to pass a selftest, 
but smart still considered the disk as "OK"...

> 
> Granted, I've only seen data corruption on sata raid-1 once so far. But 
> once is enough.

Honestly MD_RAID can do little if the write succeeded, but later the disk is 
unable to reproduce the data being written before (i.e. if the disk returns the 
wrong data). However if the disk returns an error, MD-RAID should use data from 
the other disk. If it doesn't do that, it's a severe error. Also MD-RAID should 
know any time which disk has the latest data.

> 
> (Rumour has it, it's worse with raid-5 since that only protects from 
> data loss if all chunks are committed to disk at once and not stuck in a 
> write cache waiting for the elevator.)

barriers should fix that, I guess.

My problem was different, however: If you remove a disk logically (i.e.: by 
commands) from a RAID, the remaining RAID shoul assume that that particular 
disk has no valid data on it any more. That's different, if I just disconnect 
the disk: Then if the disk reappers, the RAID may check the drive for outdated 
blocks.

What was happening in my case was that a disk that was removed per command, and 
then had been reduced in size was automatically re-added to the RAID, assuming 
the data on it are still there. The same happened when the size of that (still 
"removed") disk was grown to the previous size.

I know that disks don't change sizes usually, but that's why I put the resource 
to unmanaged, and why I removed the disk from the RAID. Unfortunately both had 
not the desired effect.

Regards,
Ulrich


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to