Re: 2.2.16-raid0.90 crashed recoveryd now!

Klaudius Foeger Thu, 07 Sep 2000 10:08:25 -0700
ulf dambacher wrote:
> 
> Hi at linux-raid
> 
> I just happend to get a major recoveryd crash by NULLPOINTER dereference
> The system ist up and working though, with the raid-disk
> mounted system messages below. So if you want any more information from
> the running system, email me, I will look at the mails on 22:15 again.
> 
> It is a 2.2.16-raid0.90 without any other changes or patch errors
> I configured a zip disk with 2 partititions as raid1-autostart array,
> did some mounting/taring/unmounting/raidstop/raidstart tests.
> Everything worked fine. Then I did  the following:
> 
> Console 2: raidsetfaulty /dev/hdb2
> Console 2: raidhotremove /dev/hdb2
Ulf Dambacher wrote:
> Console 1: tar xvzf something totheraidarray
> Console 2 while tar was running: raidhotadd /dev/hdb2
> 
> now when I do a cat /proc/mdstat it shows:
> 
> Personalities : [linear] [raid0] [raid1] [raid5] [translucent]
> read_ahead 1024 sectors
> md0 : active raid1 hdb2[2] hdb1[0] 49024 blocks [2/1] [U_] recovery=0%
> finish=10168.0min
> unused devices: <none>
> 
> this message doesn't change anymore expect the finish time continuesly
> increases -,^((
> 
> any comments?
> 
> bye
>         Ulf
> 


> 
> ---- BANG! ------------ 8-( --------------- 8-( ----------------
> 
> Sep  5 21:23:42 ulda2 kernel: Unable to handle kernel NULL pointer
> dereference at virtual address 00000000
> Sep  5 21:23:42 ulda2 kernel: current->tss.cr3 = 00101000, %cr3 =
> 00101000
> Sep  5 21:23:42 ulda2 kernel: *pde = 00000000
> Sep  5 21:23:42 ulda2 kernel: Oops: 0000
> Sep  5 21:23:42 ulda2 kernel: CPU:    0
> Sep  5 21:23:42 ulda2 kernel: EIP:    0010:[<00000000>]
> Sep  5 21:23:42 ulda2 kernel: EFLAGS: 00010002
> Sep  5 21:23:42 ulda2 kernel: eax: 00000000   ebx: 00000246   ecx:
> 00000001   edx: c0272d00
> Sep  5 21:23:42 ulda2 kernel: esi: 00000080   edi: c694a000   ebp:
> 00000000   esp: c7dd7f24
> Sep  5 21:23:42 ulda2 kernel: ds: 0018   es: 0018   ss: 0018
> Sep  5 21:23:42 ulda2 kernel: Process mdrecoveryd (pid: 6, process nr:
> 6, stackpage=c7dd7000)
> Sep  5 21:23:42 ulda2 kernel: Stack: c0192ce2 c0272d00 c7daf1e0 c4f41000
> c7dd7fd4 c7daf22c 00000080 00000002
> Sep  5 21:23:42 ulda2 kernel:        c7daf1e0 c4f41000 c7dd6000 c7dd6000
> 00000004 c7dd6000 00000000 c02415e0
> Sep  5 21:23:42 ulda2 kernel:        00000000 0000007f 00090000 00000001
> c7dd6000 00000024 00000900 c7daf238
> Sep  5 21:23:42 ulda2 kernel: Call Trace: [md_do_sync+1134/2700]
> [md_do_recovery+234/580] [md_thread+167/316] [kernel_thread+35/48]
> Sep  5 21:23:42 ulda2 kernel: Code: Bad EIP value.
> -

I had the same problem and found the reason in the read-balancing code,
which doesn't recognize, that a disk had been set faulty. So it tries to
read
from a logically unavailable disk and the read access to device [00:00]
generates the Oops.

With this small patch I could solve it on my machine:

--- drivers/block/raid1.c.orig  Thu Sep  7 18:20:46 2000
+++ drivers/block/raid1.c       Thu Sep  7 18:26:14 2000
@@ -367,8 +367,11 @@
        int i;
 
        for (i = 0; i < disks; i++)
-               if (conf->mirrors[i].next == target)
+               if (conf->mirrors[i].next == target) {
                        conf->mirrors[i].next =
conf->mirrors[target].next;
+                       if (conf->last_used == target)
+                               conf->last_used = i;
+               }
 }
 
 #define LAST_DISK KERN_ALERT \



Klaudius
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
Re: 2.2.16-raid0.90 crashed recoveryd now!

Reply via email to