Re: Newbie: What to do when a disk fails?

Mogens Kjaer Wed, 7 Jul 1999 23:54:44 -0700
Paul Jakma wrote:
....
> echo "scsi add-single-device c b t l" > /proc/scsi/scsi

I've tried this:

Insert 4 disks in system, 1 root, 3 in raid-5.

The raid is up and running.

I remove one of the disks in the raid (yes, they are supposed to
be hot plugable...).

The raid complains, and continues in degraded mode.

I run

raidhotremove /dev/md0 /dev/sdc1

This is ok.

echo "scsi remove-single-device 1 0 2 0" >/proc/scsi/scsi

(This disk is on controller 1 (controller 0 has a tape drive), unit 2).
No problem, the device has left the /proc/scsi/scsi list.

I insert a new drive in as unit 2.

echo "scsi add-single-device 1 0 2 0" >/proc/scsi/scsi

Then I'm immediately logged out!

The /var/log/messages contains the oops:

Jul  8 09:19:46 mail2 kernel: scsi singledevice 1 0 2 0
Jul  8 09:19:46 mail2 kernel: (scsi1:0:2:0) Synchronous at 20.0
Mbyte/sec, offset 8.
Jul  8 09:19:46 mail2 kernel:   Vendor: IBM       Model: XP32275W     
!#  Rev: LYEB
Jul  8 09:19:46 mail2 kernel:   Type:  
Direct-Access                      ANSI SCSI revision: 02
Jul  8 09:19:46 mail2 kernel: Unable to handle kernel paging request at
virtual address 8a042454
Jul  8 09:19:46 mail2 kernel: current->tss.cr3 = 04856000, %cr3 =
04856000
Jul  8 09:19:46 mail2 kernel: *pde = 00000000
Jul  8 09:19:46 mail2 kernel: Oops: 0002
Jul  8 09:19:46 mail2 kernel: CPU:    0
Jul  8 09:19:46 mail2 kernel: EIP:   
0010:[requeue_sr_request+1539/1540]
Jul  8 09:19:46 mail2 kernel: EFLAGS: 00010286
Jul  8 09:19:46 mail2 kernel: eax: 00000000   ebx: 00000000   ecx:
00000001   edx: c01c1953
Jul  8 09:19:46 mail2 kernel: esi: c023fc40   edi: c0004ad0   ebp:
c0017c00   esp: c481dd1c
Jul  8 09:19:46 mail2 kernel: ds: 0018   es: 0018   ss: 0018
Jul  8 09:19:46 mail2 kernel: Process bash (pid: 430, process nr: 25,
stackpage=c481d000)
Jul  8 09:19:46 mail2 PAM_pwdb[422]: (login) session closed for user
root
Jul  8 09:19:46 mail2 kernel: Stack: 00000000 c0004ad0 00000002 00000000
c00084e0 00000000 c481c000 c027cc00
Jul  8 09:19:46 mail2 kernel:        00000000 00000001 00000001 c0004ad0
c0004b0a 00000000 00000001 c0090000
Jul  8 09:19:46 mail2 kernel:        c001ae45 c481dd8c 001b1cbe c001ae00
c01b6f5c 00000400 00000000 00000000
Jul  8 09:19:46 mail2 kernel: Call Trace: [scsi_old_done+0/1388]
[scan_scsis+461/1200] [do_IRQ+58/60] [__wake_up+43/60] [printk+362/376]
[scsi_device_types+3700/5152] [scsi_proc_info+1394/1996]
Jul  8 09:19:46 mail2 kernel:        [scsi_device_types+3712/5152]
[scsi_proc_info+1563/1996] [dispatch_scsi_info+52/152]
[proc_writescsi+104/136] [sys_write+219/256] [proc_writescsi+0/136]
[system_call+52/56]
Jul  8 09:19:46 mail2 kernel: Code: 00 8b 54 24 04 8a 42 30 04 fc 3c 01
77 3f 0f b6 42 1d 50 0f

There is no connection to the drive, I can't issue the add-single-device
command again, it just hangs,
and fdisk does not recognize the drive.

I notice when I reboot, that the drive inserted spins up. Normally, when
the machine is switched on, the
drives spin up when the scsi adaptor bios scans for devices. When I make
a warm reboot, the drives are
already running. Except for the drive plugged in...

Any suggestions for what I should try? Or do I have to live with a
reboot
when a drive is lost? I've tried this, this works ok with the
raidhotremove before and
raidhotadd after.

Mogens
-- 
Mogens Kjaer, Carlsberg Laboratory, Dept. of Chemistry
Gamle Carlsberg Vej 10, DK-2500 Valby, Denmark
Phone: +45 33 27 53 25, Fax: +45 33 27 47 08
Email: [EMAIL PROTECTED] Homepage: http://www.crc.dk
Re: Newbie: What to do when a disk fails?

Reply via email to