Mon server @home a crash� avec ca dans kern.log. Ce sont les disques SCSI
avec le LVM dessus qui en sont la cause, il semble. C'est comme s'ils se
mettaient en veille sans jamais revenir � un IRQ. c'est la 2e fois que ca arrive en 2
semaines. Il faut noter que le 5e device sur la chaine SCSI est un tape
8MM que je power off apr�s le backup de la semaine (il a le terminateur actif).
Si je le boot normalement apr�s un clean shutdown,
lvm ne voit pas de VG au scan, je suis oblig� de restaurer le VGDA sur
chaque HDD (/dev/sdaN1). Apr�s ca, un vgscan voit tout le VG et les lv
sont actifs. Je peux alors travailler normalement sur les lv en reiserfs
(resize, etc marchent bien):
vgscan
vgdisplay -- no volume groups found
vgcfgrestore -f /etc/lvmconf/doc_vg.conf -n doc_vg /dev/sda1
vgcfgrestore -- VGDA for "doc_vg" successfully restored to physical volume
"/dev/sda1"
...
vgscan
vgchange -ay
vgchange -- volume group "doc_vg" successfully activated
[EMAIL PROTECTED]:/LOG# vgdisplay
--- Volume group ---
VG Name doc_vg
VG Access read/write
VG Status available/resizable
VG # 0
MAX LV 256
Cur LV 2
Open LV 0
MAX LV Size 2 TB
Max PV 256
Cur PV 4
Act PV 4
VG Size 16.62 GB
PE Size 32 MB
Total PE 532
Alloc PE / Size 288 / 9 GB
Free PE / Size 244 / 7.62 GB
VG UUID gcyqUX-080P-l7px-arts-BEA3-v1UG-YH9ec2
[EMAIL PROTECTED]:/LOG# mount -a
[EMAIL PROTECTED]:/LOG# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/hda1 1269056 1151896 117160 91% /
/dev/hda2 2947828 1141336 1806492 39% /data
/dev/hdb2 814432 32896 781536 5% /data/ftp
/dev/doc_vg/lv_cd01 6291260 2599396 3691864 42%
/data/www/documentation
/dev/doc_vg/lv_pg_data
3145628 327912 2817716 11%
/var/lib/postgres/data
et l� OK.
C'est un premier probl�me mais non bloquant � r�soudre ASAP . Le crash de
la semaine derni�re �tait lorsque j'ai ajout� des PP � un LV: un des 4 disques avait
tous ses pp libres
(sdc1) et lors du resize, crash/bang. Apr�s avoir demount� tout et retir�
les modules scsi et lvm, j'ai stopp� ma tour SCSI et remis ON, refais un
modprobe et tout remont� (apr�s le restore du VGDA malgr� tout) et il a
alors pu agrandir le LV et le reiserfs, comme si le disque sdc �tait OK.
Ce matin, rebelotte mais lors d'I/O sur fichiers.
Il y a 4 disques de 4Gb venant de pSeries en rade et un tape Exabyte 8mm.
Voil� le dump (sorry de la taille), si vous avez d�ja eu le cas... Je
cherche de mon c�t� ce soir:
Oct 24 05:02:38 nabiki kernel: scsi0:0:2:0: Attempting to queue an ABORT
message
Oct 24 05:02:38 nabiki kernel: scsi0: Dumping Card State in Data-in phase,
at SEQADDR 0x9d
Oct 24 05:02:38 nabiki kernel: ACCUM = 0x0, SINDEX = 0x8, DINDEX = 0x8f,
ARG_2 = 0xff
Oct 24 05:02:38 nabiki kernel: HCNT = 0x0 SCBPTR = 0x1
Oct 24 05:02:38 nabiki kernel: SCSISEQ = 0x12, SBLKCTL = 0x0
Oct 24 05:02:38 nabiki kernel: DFCNTRL = 0x0, DFSTATUS = 0x28
Oct 24 05:02:38 nabiki kernel: LASTPHASE = 0x40, SCSISIGI = 0x44, SXFRCTL0
= 0xa8
Oct 24 05:02:38 nabiki kernel: SSTAT0 = 0x7, SSTAT1 = 0x2
Oct 24 05:02:38 nabiki kernel: STACK == 0x9a, 0x19b, 0x15a, 0x0
Oct 24 05:02:38 nabiki kernel: SCB count = 20
Oct 24 05:02:38 nabiki kernel: Kernel NEXTQSCB = 5
Oct 24 05:02:38 nabiki kernel: Card NEXTQSCB = 11
Oct 24 05:02:38 nabiki kernel: QINFIFO entries: 11
Oct 24 05:02:38 nabiki kernel: Waiting Queue entries:
Oct 24 05:02:38 nabiki kernel: Disconnected Queue entries:
Oct 24 05:02:38 nabiki kernel: QOUTFIFO entries:
Oct 24 05:02:38 nabiki kernel: Sequencer Free SCB List: 2 0
Oct 24 05:02:38 nabiki kernel: Sequencer SCB Info: 0(c 0x68, s 0x27, l 0,
t 0xff) 1(c 0x68, s 0x27, l 0, t 0x0) 2(c 0x68, s 0x27, l 0, t 0xff)
Oct 24 05:02:38 nabiki kernel: Pending list: 11(c 0x68, s 0x27, l 0), 0(c
0x68, s 0x27, l 0)
Oct 24 05:02:38 nabiki kernel: Kernel Free SCB list: 14 2 9 13 4 3 1 19 7
8 10 6 12 15 18 17 16
Oct 24 05:02:38 nabiki kernel: DevQ(0:0:0): 0 waiting
Oct 24 05:02:38 nabiki kernel: DevQ(0:2:0): 0 waiting
Oct 24 05:02:38 nabiki kernel: DevQ(0:3:0): 0 waiting
Oct 24 05:02:38 nabiki kernel: DevQ(0:5:0): 0 waiting
Oct 24 05:02:38 nabiki kernel: DevQ(0:6:0): 0 waiting
Oct 24 05:02:38 nabiki kernel: scsi0:0:2:0: Device is active, asserting
ATN
Oct 24 05:02:38 nabiki kernel: Recovery code sleeping
Oct 24 05:02:38 nabiki kernel: Recovery code awake
Oct 24 05:02:38 nabiki kernel: aic7xxx_abort returns 0x2002
Oct 24 05:02:48 nabiki kernel: scsi0:0:2:0: Attempting to queue an ABORT
message
Oct 24 05:02:48 nabiki kernel: scsi0: Dumping Card State in Data-in phase,
at SEQADDR 0x9d
Oct 24 05:02:48 nabiki kernel: ACCUM = 0x0, SINDEX = 0x8, DINDEX = 0x8f,
ARG_2 = 0xff
Oct 24 05:02:48 nabiki kernel: HCNT = 0x0 SCBPTR = 0x1
Oct 24 05:02:48 nabiki kernel: SCSISEQ = 0x12, SBLKCTL = 0x0
Oct 24 05:02:48 nabiki kernel: DFCNTRL = 0x0, DFSTATUS = 0x28
Oct 24 05:02:48 nabiki kernel: LASTPHASE = 0x40, SCSISIGI = 0x54, SXFRCTL0
= 0xa8
Oct 24 05:02:48 nabiki kernel: SSTAT0 = 0x7, SSTAT1 = 0x2
Oct 24 05:02:48 nabiki kernel: STACK == 0x9a, 0x19b, 0x15a, 0x0
Oct 24 05:02:48 nabiki kernel: SCB count = 20
Oct 24 05:02:48 nabiki kernel: Kernel NEXTQSCB = 14
Oct 24 05:02:48 nabiki kernel: Card NEXTQSCB = 11
Oct 24 05:02:48 nabiki kernel: QINFIFO entries: 11 5
Oct 24 05:02:48 nabiki kernel: Waiting Queue entries:
Oct 24 05:02:48 nabiki kernel: Disconnected Queue entries:
Oct 24 05:02:48 nabiki kernel: QOUTFIFO entries:
Oct 24 05:02:48 nabiki kernel: Sequencer Free SCB List: 2 0
Oct 24 05:02:48 nabiki kernel: Sequencer SCB Info: 0(c 0x68, s 0x27, l 0,
t 0xff) 1(c 0x68, s 0x27, l 0, t 0x0) 2(c 0x68, s 0x27, l 0, t 0xff)
Oct 24 05:02:48 nabiki kernel: Pending list: 5(c 0x68, s 0x27, l 0), 11(c
0x68, s 0x27, l 0), 0(c 0x68, s 0x27, l 0)
Oct 24 05:02:48 nabiki kernel: Kernel Free SCB list: 2 9 13 4 3 1 19 7 8
10 6 12 15 18 17 16
Oct 24 05:02:48 nabiki kernel: DevQ(0:0:0): 0 waiting
Oct 24 05:02:48 nabiki kernel: DevQ(0:2:0): 0 waiting
Oct 24 05:02:48 nabiki kernel: DevQ(0:3:0): 0 waiting
Oct 24 05:02:48 nabiki kernel: DevQ(0:5:0): 0 waiting
Oct 24 05:02:48 nabiki kernel: DevQ(0:6:0): 0 waiting
Oct 24 05:02:48 nabiki kernel: scsi0:0:2:0: Cmd aborted from QINFIFO
Oct 24 05:02:48 nabiki kernel: aic7xxx_abort returns 0x2002
Oct 24 05:02:48 nabiki kernel: scsi0:0:2:0: Attempting to queue an ABORT
message
Oct 24 05:02:48 nabiki kernel: scsi0: Dumping Card State in Data-in phase,
at SEQADDR 0x9d
Oct 24 05:02:48 nabiki kernel: ACCUM = 0x0, SINDEX = 0x8, DINDEX = 0x8f,
ARG_2 = 0xff
Oct 24 05:02:48 nabiki kernel: HCNT = 0x0 SCBPTR = 0x1
Oct 24 05:02:48 nabiki kernel: SCSISEQ = 0x12, SBLKCTL = 0x0
Oct 24 05:02:48 nabiki kernel: DFCNTRL = 0x0, DFSTATUS = 0x28
Oct 24 05:02:48 nabiki kernel: LASTPHASE = 0x40, SCSISIGI = 0x54, SXFRCTL0
= 0xa8
Oct 24 05:02:48 nabiki kernel: SSTAT0 = 0x7, SSTAT1 = 0x2
Oct 24 05:02:48 nabiki kernel: STACK == 0x9a, 0x19b, 0x15a, 0x0
Oct 24 05:02:48 nabiki kernel: SCB count = 20
Oct 24 05:02:48 nabiki kernel: Kernel NEXTQSCB = 11
Oct 24 05:02:48 nabiki kernel: Card NEXTQSCB = 14
Oct 24 05:02:48 nabiki kernel: QINFIFO entries: 14
Oct 24 05:02:48 nabiki kernel: Waiting Queue entries:
Oct 24 05:02:48 nabiki kernel: Disconnected Queue entries:
Oct 24 05:02:48 nabiki kernel: QOUTFIFO entries:
Oct 24 05:02:48 nabiki kernel: Sequencer Free SCB List: 2 0
Oct 24 05:02:48 nabiki kernel: Sequencer SCB Info: 0(c 0x68, s 0x27, l 0,
t 0xff) 1(c 0x68, s 0x27, l 0, t 0x0) 2(c 0x68, s 0x27, l 0, t 0xff)
Oct 24 05:02:48 nabiki kernel: Pending list: 14(c 0x68, s 0x27, l 0), 0(c
0x68, s 0x27, l 0)
Oct 24 05:02:48 nabiki kernel: Kernel Free SCB list: 5 2 9 13 4 3 1 19 7 8
10 6 12 15 18 17 16
Oct 24 05:02:48 nabiki kernel: DevQ(0:0:0): 0 waiting
Oct 24 05:02:48 nabiki kernel: DevQ(0:2:0): 0 waiting
Oct 24 05:02:48 nabiki kernel: DevQ(0:3:0): 0 waiting
...
Oct 24 05:04:44 nabiki kernel: scsi0:0:2:0: Cmd aborted from QINFIFO
Oct 24 05:04:44 nabiki kernel: aic7xxx_abort returns 0x2002
Oct 24 05:04:44 nabiki kernel: scsi: device set offline - not ready or
command retry failed after bus reset: host 0 channel 0 id 2 lun 0
Oct 24 05:04:44 nabiki kernel: SCSI disk error : host 0 channel 0 id 2 lun
0 return code = 50000
Oct 24 05:04:44 nabiki kernel: I/O error: dev 08:11, sector 3949040
Oct 24 05:04:44 nabiki kernel: I/O error: dev 08:11, sector 3949048
Oct 24 05:04:44 nabiki kernel: SCSI disk error : host 0 channel 0 id 2 lun
0 return code = 3f0000
Oct 24 05:04:44 nabiki kernel: I/O error: dev 08:11, sector 4012624
Oct 24 05:04:44 nabiki kernel: I/O error: dev 08:11, sector 4012632
Oct 24 05:04:44 nabiki kernel: journal-601, buffer write failed
Oct 24 05:04:44 nabiki kernel: kernel BUG at prints.c:334!
Oct 24 05:04:44 nabiki kernel: invalid operand: 0000
Oct 24 05:04:44 nabiki kernel: CPU: 0
Oct 24 05:04:44 nabiki kernel: EIP:
0010:[md:__insmod_md_O/lib/modules/2.4.20-k6/kernel/drivers/md/md.o_+-2793383/96]
Not tainted
Oct 24 05:04:44 nabiki kernel: EFLAGS: 00010282
Oct 24 05:04:44 nabiki kernel: eax: 00000024 ebx: d08a8340 ecx:
00000001 edx: 00000001
Oct 24 05:04:44 nabiki kernel: esi: c50abc00 edi: c50abc00 ebp:
0000000d esp: c13a3ee0
Oct 24 05:04:44 nabiki kernel: ds: 0018 es: 0018 ss: 0018
Oct 24 05:04:44 nabiki kernel: Process kupdated (pid: 6,
stackpage=c13a3000)
Oct 24 05:04:44 nabiki kernel: Stack: d08a67da d08aa420 d08a8340 c13a3f04
d0d7ad88 00000000 d089f0be c50abc00
Oct 24 05:04:44 nabiki kernel: d08a8340 00000025 00000012 00000010
00000000 d0d7adbc d0d7adb0 0000000e
Oct 24 05:04:44 nabiki kernel: 00000000 c77432c0 d08a27be c50abc00
d0d7ad88 00000001 c13a3f98 c50abc00
Oct 24 05:04:44 nabiki kernel: Call Trace:
[md:__insmod_md_O/lib/modules/2.4.20-k6/kernel/drivers/md/md.o_+-2721830/96]
[md:__insmod_md_O/lib/modules/2.4.20-k6/kernel/drivers/md/md.o_+-2706400/96]
[md:__insmod_md_O/lib/modules/2.4.20-k6/kernel/drivers/md/md.o_+-2714816/96]
[md:__insmod_md_O/lib/modules/2.4.20-k6/kernel/drivers/md/md.o_+-2752322/96]
[md:__insmod_md_O/lib/modules/2.4.20-k6/kernel/drivers/md/md.o_+-2714816/96]
Oct 24 05:04:44 nabiki kernel:
[md:__insmod_md_O/lib/modules/2.4.20-k6/kernel/drivers/md/md.o_+-2738242/96]
[md:__insmod_md_O/lib/modules/2.4.20-k6/kernel/drivers/md/md.o_+-2741571/96]
[md:__insmod_md_O/lib/modules/2.4.20-k6/kernel/drivers/md/md.o_+-2710961/96]
[md:__insmod_md_O/lib/modules/2.4.20-k6/kernel/drivers/md/md.o_+-2803643/96]
[sync_supers+222/288] [sync_old_buffers+14/68]
Oct 24 05:04:44 nabiki kernel: [kupdate+217/252] [kernel_thread+40/56]
Oct 24 05:04:44 nabiki kernel:
Oct 24 05:04:44 nabiki kernel: Code: 0f 0b 4e 01 e0 67 8a d0 68 20 a4 8a
d0 85 f6 74 16 0f b7 46
_______________________________________________________
Linux Mailing List - http://www.unixtech.be
Subscribe/Unsubscribe: http://www.unixtech.be/mailman/listinfo/linux
Archives: http://www.mail-archive.com/[EMAIL PROTECTED]
IRC: efnet.unixtech.be:6667 - #unixtech