Title: Oops problem with Raid 1 and raidhotadd on kernel 2.2.16

I was running into some occasional oops messages from the kernel with a raid 1 set.

After some prolonged testing I finally was able to easily reproduce the problem with a specific sequence of commands.

Consider a running Raid 1 made up of hdc1 and hdd1.  Size of the partitions doesn't seem to matter so I created them with 100Megs.  IDE vs SCSI also doesn't seem to matter.

raidsetfaulty /dev/md0 /dev/hdc1
raidhotremove /dev/md0 /dev/hdc1
raidhotadd /dev/md0 /dev/hdc1

1 out of 3 tries I get an oops almost immediately somewhere in the middle of md_do_sync.
If I reboot the computer it will recognize the re-added drive and resync it correctly.

If I create a raid 5 with hdb1, hdc1, and hdd1 and perform the same test I do not seem to get the oops.

Preliminary research seems to point to md_do_sync():

                run_task_queue(&tq_disk);

Screen capture at the oops:

md: syncing RAID array md0
md: minimum _guaranteed_ reconstruction speed: 100 KB/sec.
md: using maximum available idle IO bandwith for reconstruction.
md: using 128k window.
Unable to handle kernel NULL pointer dereference at virtual address 00000000
current->tss.cr3 = 00101000, %cr3 = 00101000
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0010:[<00000000>]
EFLAGS: 00010002
eax: 00000000   ebx: 00000246   ecx: 00000001   edx: c0256cdc
esi: c364bde0   edi: 00000080   ebp: c3efbfd8   esp: c3efbf38
ds: 0018   es: 0018   ss: 0018
Process mdrecoveryd (pid: 6, process nr: 6, stackpage=c3efb000)
Stack: c018f032 c0256cdc c3508340 c3769000 c0225f24 c3efbfd8 c0109d75 00000001
       c01099d8 00000000 000003fd 000003f9 c023ed30 00000036 00000001 00000000
       c02251ec 00000000 00000900 00000036 00001000 00001000 008d54fb 00000000
Call Trace: [<c018f032>] [<c0109d75>] [<c01099d8>] [<c018f6ca>] [<c018e38f>] [<c
0203208>] [<c01074e3>]
       [<c018f5d8>]
Code: Bad EIP value.


I am no longer on the linux-raid mailing list because I can't seem to make Outlook send plain text messages.  Please cc me on replies, and/or I will watch the archives for the linux-raid group.

I have to fix this soon, so I will post any other things I find.

Reply via email to