>>> Dimitri Maziuk <[email protected]> schrieb am 30.11.2011 um 15:54 in Nachricht <[email protected]>: > On 11/30/2011 2:01 AM, Ulrich Windl wrote: >>>>> Dimitri Maziuk<[email protected]> schrieb am 29.11.2011 um 19:36 in > > Nachricht<[email protected]>: > >> On 11/29/2011 07:49 AM, Lars Marowsky-Bree wrote: > >> > >>> (But the mdadm operations the RA does also shouldn't cause data > >>> corruption. That strikes me as an MD bug.) > >> > >> If you repeatedly try to re-sync with a dying disk, with each resync > >> interrupted by i/o error, you will get data corruption sooner or later. > >> It's only MD bug in a sense that MD can't actually stop you from > >> shooting yourself. > > > > I'd like to know more details: Which disk has an I/O error: source > > or > destination of the sync. How is data corruption created? > > Well that's the point: if you have 2 disks, and neither has failed yet, > how do you pick the one that isn't failing? > > Specific failure mode I'm talking about is busy relocating bad sectors. > Until the SMART counter hits the threshold value it's "not failed", but > you'll see sata timeouts/resets in /var/log/messages with spiking i/o > wait and those "all sorts of hangs" Lars mentioned. If mdadm decides to > use that disk as the source, you have a race: either SMART will fail the > disk before it starts dropping bits or develops an unrelocatable bad > sector, or said bad sectors will get copied to the mirror disk.
Hi! OK, so the RAID1 is writing to both disks normally when one disk starts to have a problem, and that disk will relocate sectors. Asuming that the write to disk succeeds the relocation must have been successful. Write performance may be bad though. When reading back such sectors, the read should succeed. However there are multiple firmware errors in the wild that make disks return bad data from the cache when retrying a read instead of returning the proper data that were read successfully. And in my exerience: Don't wait until SAMRT declared the disk as bad. Drop the disk once it starts to relocate more than a sector per week, or if more than 100 sectore were relocated. I saw disks that were unable to pass a selftest, but smart still considered the disk as "OK"... > > Granted, I've only seen data corruption on sata raid-1 once so far. But > once is enough. Honestly MD_RAID can do little if the write succeeded, but later the disk is unable to reproduce the data being written before (i.e. if the disk returns the wrong data). However if the disk returns an error, MD-RAID should use data from the other disk. If it doesn't do that, it's a severe error. Also MD-RAID should know any time which disk has the latest data. > > (Rumour has it, it's worse with raid-5 since that only protects from > data loss if all chunks are committed to disk at once and not stuck in a > write cache waiting for the elevator.) barriers should fix that, I guess. My problem was different, however: If you remove a disk logically (i.e.: by commands) from a RAID, the remaining RAID shoul assume that that particular disk has no valid data on it any more. That's different, if I just disconnect the disk: Then if the disk reappers, the RAID may check the drive for outdated blocks. What was happening in my case was that a disk that was removed per command, and then had been reduced in size was automatically re-added to the RAID, assuming the data on it are still there. The same happened when the size of that (still "removed") disk was grown to the previous size. I know that disks don't change sizes usually, but that's why I put the resource to unmanaged, and why I removed the disk from the RAID. Unfortunately both had not the desired effect. Regards, Ulrich _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
