Re: nonzero mismatch_cnt with no earlier error

2007-02-25 Thread Christian Pernegger
Sorry to hijack the thread a little but I just noticed that the mismatch_cnt for my mirror is at 256. I'd always thought the monthly check done by the mdadm Debian package does repair as well - apparently it doesn't. So I guess I should run repair but I'm wondering ... - is it safe / bugfree

Re: nonzero mismatch_cnt with no earlier error

2007-02-25 Thread Bill Davidsen
Justin Piszcz wrote: On Sat, 24 Feb 2007, Michael Tokarev wrote: Jason Rainforest wrote: I tried doing a check, found a mismatch_cnt of 8 (7*250Gb SW RAID5, multiple controllers on Linux 2.6.19.2, SMP x86-64 on Athlon64 X2 4200 +). I then ordered a resync. The mismatch_cnt returned to 0 at

Re: nonzero mismatch_cnt with no earlier error

2007-02-25 Thread Justin Piszcz
On Sun, 25 Feb 2007, Christian Pernegger wrote: Sorry to hijack the thread a little but I just noticed that the mismatch_cnt for my mirror is at 256. I'd always thought the monthly check done by the mdadm Debian package does repair as well - apparently it doesn't. So I guess I should run

Re: nonzero mismatch_cnt with no earlier error

2007-02-25 Thread Neil Brown
On Saturday February 24, [EMAIL PROTECTED] wrote: But is this not a good opportunity to repair the bad stripe for a very low cost (no complete resync required)? In this case, 'md' knew nothing about an error. The SCSI layer detected something and thought it had fixed it itself. Nothing for md

Re: nonzero mismatch_cnt with no earlier error

2007-02-25 Thread Jeff Breidenbach
Ok, so hearing all the excitement I ran a check on a multi-disk RAID-1. One of the RAID-1 disks failed out, maybe by coincidence but presumably due to the check. (I also have another disk in the array deliberately removed as a backup mechanism.) And of course there is a big mismatch count.

Re: nonzero mismatch_cnt with no earlier error

2007-02-24 Thread Justin Piszcz
Of course you could just run repair but then you would never know that mismatch_cnt was 0. Justin. On Sat, 24 Feb 2007, Justin Piszcz wrote: Perhaps, The way it works (I believe is as follows) 1. echo check sync_action 2. If mismatch_cnt 0 then run: 3. echo repair sync_action 4. Re-run

Re: nonzero mismatch_cnt with no earlier error

2007-02-24 Thread Jason Rainforest
I tried doing a check, found a mismatch_cnt of 8 (7*250Gb SW RAID5, multiple controllers on Linux 2.6.19.2, SMP x86-64 on Athlon64 X2 4200 +). I then ordered a resync. The mismatch_cnt returned to 0 at the start of the resync, but around the same time that it went up to 8 with the check, it went

Re: nonzero mismatch_cnt with no earlier error

2007-02-24 Thread Justin Piszcz
A resync? You're supposed to run a 'repair' are you not? Justin. On Sat, 24 Feb 2007, Jason Rainforest wrote: I tried doing a check, found a mismatch_cnt of 8 (7*250Gb SW RAID5, multiple controllers on Linux 2.6.19.2, SMP x86-64 on Athlon64 X2 4200 +). I then ordered a resync. The

Re: nonzero mismatch_cnt with no earlier error

2007-02-24 Thread Jason Rainforest
Yes, I meant repair, sorry. I checked my bash history and I did indeed order a repair (echo repair /sys/block/md0/md/sync_action). I think I called it a resync because that's what /proc/mdstat told me it was doing. On Sat, 2007-02-24 at 04:50 -0500, Justin Piszcz wrote: A resync? You're

Re: nonzero mismatch_cnt with no earlier error

2007-02-24 Thread Justin Piszcz
Ahh, perhaps Neil can fix that? ;) Cat /sys/block/md0/md/sync_action will tell you what it is really doing. On Sat, 24 Feb 2007, Jason Rainforest wrote: Yes, I meant repair, sorry. I checked my bash history and I did indeed order a repair (echo repair /sys/block/md0/md/sync_action). I think

Re: nonzero mismatch_cnt with no earlier error

2007-02-24 Thread Michael Tokarev
Jason Rainforest wrote: I tried doing a check, found a mismatch_cnt of 8 (7*250Gb SW RAID5, multiple controllers on Linux 2.6.19.2, SMP x86-64 on Athlon64 X2 4200 +). I then ordered a resync. The mismatch_cnt returned to 0 at the start of As pointed out later it was repair, not resync.

Re: nonzero mismatch_cnt with no earlier error

2007-02-24 Thread Justin Piszcz
On Sat, 24 Feb 2007, Michael Tokarev wrote: Jason Rainforest wrote: I tried doing a check, found a mismatch_cnt of 8 (7*250Gb SW RAID5, multiple controllers on Linux 2.6.19.2, SMP x86-64 on Athlon64 X2 4200 +). I then ordered a resync. The mismatch_cnt returned to 0 at the start of As

nonzero mismatch_cnt with no earlier error

2007-02-23 Thread Eyal Lebedinsky
I run a 'check' weekly, and yesterday it came up with a non-zero mismatch count (184). There were no earlier RAID errors logged and the count was zero after the run a week ago. Now, the interesting part is that there was one i/o error logged during the check *last week*, however the raid did not

Re: nonzero mismatch_cnt with no earlier error

2007-02-23 Thread Eyal Lebedinsky
I did a resync since, which ended up with the same mismatch_cnt of 184. I noticed that the count *was* reset to zero when the resync started, but ended up with 184 (same as after the check). I thought that the resync just calculates fresh parity and does not bother checking if it is different. So