On Tue, 28 Oct 2003, Jean-Francois Dive wrote:
> as tu ksymoops ? Backup to fichier log dans /var/log/ksymoops/`date`
> et decode les tracebacks. Quel kernel est ce ? standard debian ? Si oui,
> envoie un mail a [EMAIL PROTECTED] et demande lui si il sais se que
> c'est. Probleme dans le module raid, mais est ce a cause de l'erreur
> scsi qui n'est pas bien handelee ou est ce vraiment un probleme de LVM,
> telle est la question. Perso, je ne connais vraiment pas bien le layer
> FS, donc, ji ni sais nin trop.
C'est une debian testing avec un 2.4.20-k6. J'ai ksymoops, mais dans le
log d'hier, je ne vois rien me mettant sur une piste. J'ai pas mal de
<date><time>.ksyms et.modules mais je ne vois pas d'erreurs dedans. Voil�
la fin d'un dmesg � l'instant:
journal-601, buffer write failed
kernel BUG at prints.c:334!
invalid operand: 0000
CPU: 0
EIP: 0010:[<d0895059>] Not tainted
EFLAGS: 00010282
eax: 00000024 ebx: d08a8340 ecx: 00000001 edx: 00000001
esi: c28a2000 edi: c28a2000 ebp: 00000009 esp: c13a3ee0
ds: 0018 es: 0018 ss: 0018
Process kupdated (pid: 6, stackpage=c13a3000)
Stack: d08a67da d08aa420 d08a8340 c13a3f04 d0d7b32c 00000000 d089f0be
c28a2000
d08a8340 00000038 00000012 00000010 00000000 d0d7b360 d0d7b354
0000000a
00000000 c7896ce0 d08a27be c28a2000 d0d7b32c 00000001 c13a3f98
c28a2000
Call Trace: [<d08a67da>] [<d08aa420>] [<d08a8340>] [<d089f0be>]
[<d08a8340>]
[<d08a27be>] [<d08a1abd>] [<d08a924f>] [<d0892845>] [<c01355a2>]
[<c0134bd2>]
[<c0134e35>] [<c010705c>]
Code: 0f 0b 4e 01 e0 67 8a d0 68 20 a4 8a d0 85 f6 74 16 0f b7 46
I/O error: dev 08:11, sector 4042256
Mais les disques eux-m�me sont OK quand j'avais fais le mkreiserfs, pas
d'erreurs.
>
> On Mon, 2003-10-27 at 18:58, Vincent Jamart wrote:
> > Mon server @home a crash� avec ca dans kern.log. Ce sont les disques SCSI
> > avec le LVM dessus qui en sont la cause, il semble. C'est comme s'ils se
> > mettaient en veille sans jamais revenir � un IRQ. c'est la 2e fois que ca arrive
> > en 2
> > semaines. Il faut noter que le 5e device sur la chaine SCSI est un tape
> > 8MM que je power off apr�s le backup de la semaine (il a le terminateur actif).
> >
> > Si je le boot normalement apr�s un clean shutdown,
> > lvm ne voit pas de VG au scan, je suis oblig� de restaurer le VGDA sur
> > chaque HDD (/dev/sdaN1). Apr�s ca, un vgscan voit tout le VG et les lv
> > sont actifs. Je peux alors travailler normalement sur les lv en reiserfs
> > (resize, etc marchent bien):
> >
> > vgscan
> > vgdisplay -- no volume groups found
> > vgcfgrestore -f /etc/lvmconf/doc_vg.conf -n doc_vg /dev/sda1
> > vgcfgrestore -- VGDA for "doc_vg" successfully restored to physical volume
> > "/dev/sda1"
> > ...
> > vgscan
> > vgchange -ay
> > vgchange -- volume group "doc_vg" successfully activated
> > [EMAIL PROTECTED]:/LOG# vgdisplay
> > --- Volume group ---
> > VG Name doc_vg
> > VG Access read/write
> > VG Status available/resizable
> > VG # 0
> > MAX LV 256
> > Cur LV 2
> > Open LV 0
> > MAX LV Size 2 TB
> > Max PV 256
> > Cur PV 4
> > Act PV 4
> > VG Size 16.62 GB
> > PE Size 32 MB
> > Total PE 532
> > Alloc PE / Size 288 / 9 GB
> > Free PE / Size 244 / 7.62 GB
> > VG UUID gcyqUX-080P-l7px-arts-BEA3-v1UG-YH9ec2
> >
> > [EMAIL PROTECTED]:/LOG# mount -a
> > [EMAIL PROTECTED]:/LOG# df
> > Filesystem 1K-blocks Used Available Use% Mounted on
> > /dev/hda1 1269056 1151896 117160 91% /
> > /dev/hda2 2947828 1141336 1806492 39% /data
> > /dev/hdb2 814432 32896 781536 5% /data/ftp
> > /dev/doc_vg/lv_cd01 6291260 2599396 3691864 42%
> > /data/www/documentation
> > /dev/doc_vg/lv_pg_data
> > 3145628 327912 2817716 11%
> > /var/lib/postgres/data
> >
> > et l� OK.
> >
> > C'est un premier probl�me mais non bloquant � r�soudre ASAP . Le crash de
> > la semaine derni�re �tait lorsque j'ai ajout� des PP � un LV: un des 4 disques
> > avait tous ses pp libres
> > (sdc1) et lors du resize, crash/bang. Apr�s avoir demount� tout et retir�
> > les modules scsi et lvm, j'ai stopp� ma tour SCSI et remis ON, refais un
> > modprobe et tout remont� (apr�s le restore du VGDA malgr� tout) et il a
> > alors pu agrandir le LV et le reiserfs, comme si le disque sdc �tait OK.
> > Ce matin, rebelotte mais lors d'I/O sur fichiers.
> >
> > Il y a 4 disques de 4Gb venant de pSeries en rade et un tape Exabyte 8mm.
> >
> > Voil� le dump (sorry de la taille), si vous avez d�ja eu le cas... Je
> > cherche de mon c�t� ce soir:
> >
> > Oct 24 05:02:38 nabiki kernel: scsi0:0:2:0: Attempting to queue an ABORT
> > message
> > Oct 24 05:02:38 nabiki kernel: scsi0: Dumping Card State in Data-in phase,
> > at SEQADDR 0x9d
> > Oct 24 05:02:38 nabiki kernel: ACCUM = 0x0, SINDEX = 0x8, DINDEX = 0x8f,
> > ARG_2 = 0xff
> > Oct 24 05:02:38 nabiki kernel: HCNT = 0x0 SCBPTR = 0x1
> > Oct 24 05:02:38 nabiki kernel: SCSISEQ = 0x12, SBLKCTL = 0x0
> > Oct 24 05:02:38 nabiki kernel: DFCNTRL = 0x0, DFSTATUS = 0x28
> > Oct 24 05:02:38 nabiki kernel: LASTPHASE = 0x40, SCSISIGI = 0x44, SXFRCTL0
> > = 0xa8
> > Oct 24 05:02:38 nabiki kernel: SSTAT0 = 0x7, SSTAT1 = 0x2
> > Oct 24 05:02:38 nabiki kernel: STACK == 0x9a, 0x19b, 0x15a, 0x0
> > Oct 24 05:02:38 nabiki kernel: SCB count = 20
> > Oct 24 05:02:38 nabiki kernel: Kernel NEXTQSCB = 5
> > Oct 24 05:02:38 nabiki kernel: Card NEXTQSCB = 11
> > Oct 24 05:02:38 nabiki kernel: QINFIFO entries: 11
> > Oct 24 05:02:38 nabiki kernel: Waiting Queue entries:
> > Oct 24 05:02:38 nabiki kernel: Disconnected Queue entries:
> > Oct 24 05:02:38 nabiki kernel: QOUTFIFO entries:
> > Oct 24 05:02:38 nabiki kernel: Sequencer Free SCB List: 2 0
> > Oct 24 05:02:38 nabiki kernel: Sequencer SCB Info: 0(c 0x68, s 0x27, l 0,
> > t 0xff) 1(c 0x68, s 0x27, l 0, t 0x0) 2(c 0x68, s 0x27, l 0, t 0xff)
> > Oct 24 05:02:38 nabiki kernel: Pending list: 11(c 0x68, s 0x27, l 0), 0(c
> > 0x68, s 0x27, l 0)
> > Oct 24 05:02:38 nabiki kernel: Kernel Free SCB list: 14 2 9 13 4 3 1 19 7
> > 8 10 6 12 15 18 17 16
> > Oct 24 05:02:38 nabiki kernel: DevQ(0:0:0): 0 waiting
> > Oct 24 05:02:38 nabiki kernel: DevQ(0:2:0): 0 waiting
> > Oct 24 05:02:38 nabiki kernel: DevQ(0:3:0): 0 waiting
> > Oct 24 05:02:38 nabiki kernel: DevQ(0:5:0): 0 waiting
> > Oct 24 05:02:38 nabiki kernel: DevQ(0:6:0): 0 waiting
> > Oct 24 05:02:38 nabiki kernel: scsi0:0:2:0: Device is active, asserting
> > ATN
> > Oct 24 05:02:38 nabiki kernel: Recovery code sleeping
> > Oct 24 05:02:38 nabiki kernel: Recovery code awake
> > Oct 24 05:02:38 nabiki kernel: aic7xxx_abort returns 0x2002
> > Oct 24 05:02:48 nabiki kernel: scsi0:0:2:0: Attempting to queue an ABORT
> > message
> > Oct 24 05:02:48 nabiki kernel: scsi0: Dumping Card State in Data-in phase,
> > at SEQADDR 0x9d
> > Oct 24 05:02:48 nabiki kernel: ACCUM = 0x0, SINDEX = 0x8, DINDEX = 0x8f,
> > ARG_2 = 0xff
> > Oct 24 05:02:48 nabiki kernel: HCNT = 0x0 SCBPTR = 0x1
> > Oct 24 05:02:48 nabiki kernel: SCSISEQ = 0x12, SBLKCTL = 0x0
> > Oct 24 05:02:48 nabiki kernel: DFCNTRL = 0x0, DFSTATUS = 0x28
> > Oct 24 05:02:48 nabiki kernel: LASTPHASE = 0x40, SCSISIGI = 0x54, SXFRCTL0
> > = 0xa8
> > Oct 24 05:02:48 nabiki kernel: SSTAT0 = 0x7, SSTAT1 = 0x2
> > Oct 24 05:02:48 nabiki kernel: STACK == 0x9a, 0x19b, 0x15a, 0x0
> > Oct 24 05:02:48 nabiki kernel: SCB count = 20
> > Oct 24 05:02:48 nabiki kernel: Kernel NEXTQSCB = 14
> > Oct 24 05:02:48 nabiki kernel: Card NEXTQSCB = 11
> > Oct 24 05:02:48 nabiki kernel: QINFIFO entries: 11 5
> > Oct 24 05:02:48 nabiki kernel: Waiting Queue entries:
> > Oct 24 05:02:48 nabiki kernel: Disconnected Queue entries:
> > Oct 24 05:02:48 nabiki kernel: QOUTFIFO entries:
> > Oct 24 05:02:48 nabiki kernel: Sequencer Free SCB List: 2 0
> > Oct 24 05:02:48 nabiki kernel: Sequencer SCB Info: 0(c 0x68, s 0x27, l 0,
> > t 0xff) 1(c 0x68, s 0x27, l 0, t 0x0) 2(c 0x68, s 0x27, l 0, t 0xff)
> > Oct 24 05:02:48 nabiki kernel: Pending list: 5(c 0x68, s 0x27, l 0), 11(c
> > 0x68, s 0x27, l 0), 0(c 0x68, s 0x27, l 0)
> > Oct 24 05:02:48 nabiki kernel: Kernel Free SCB list: 2 9 13 4 3 1 19 7 8
> > 10 6 12 15 18 17 16
> > Oct 24 05:02:48 nabiki kernel: DevQ(0:0:0): 0 waiting
> > Oct 24 05:02:48 nabiki kernel: DevQ(0:2:0): 0 waiting
> > Oct 24 05:02:48 nabiki kernel: DevQ(0:3:0): 0 waiting
> > Oct 24 05:02:48 nabiki kernel: DevQ(0:5:0): 0 waiting
> > Oct 24 05:02:48 nabiki kernel: DevQ(0:6:0): 0 waiting
> > Oct 24 05:02:48 nabiki kernel: scsi0:0:2:0: Cmd aborted from QINFIFO
> > Oct 24 05:02:48 nabiki kernel: aic7xxx_abort returns 0x2002
> > Oct 24 05:02:48 nabiki kernel: scsi0:0:2:0: Attempting to queue an ABORT
> > message
> > Oct 24 05:02:48 nabiki kernel: scsi0: Dumping Card State in Data-in phase,
> > at SEQADDR 0x9d
> > Oct 24 05:02:48 nabiki kernel: ACCUM = 0x0, SINDEX = 0x8, DINDEX = 0x8f,
> > ARG_2 = 0xff
> > Oct 24 05:02:48 nabiki kernel: HCNT = 0x0 SCBPTR = 0x1
> > Oct 24 05:02:48 nabiki kernel: SCSISEQ = 0x12, SBLKCTL = 0x0
> > Oct 24 05:02:48 nabiki kernel: DFCNTRL = 0x0, DFSTATUS = 0x28
> > Oct 24 05:02:48 nabiki kernel: LASTPHASE = 0x40, SCSISIGI = 0x54, SXFRCTL0
> > = 0xa8
> > Oct 24 05:02:48 nabiki kernel: SSTAT0 = 0x7, SSTAT1 = 0x2
> > Oct 24 05:02:48 nabiki kernel: STACK == 0x9a, 0x19b, 0x15a, 0x0
> > Oct 24 05:02:48 nabiki kernel: SCB count = 20
> > Oct 24 05:02:48 nabiki kernel: Kernel NEXTQSCB = 11
> > Oct 24 05:02:48 nabiki kernel: Card NEXTQSCB = 14
> > Oct 24 05:02:48 nabiki kernel: QINFIFO entries: 14
> > Oct 24 05:02:48 nabiki kernel: Waiting Queue entries:
> > Oct 24 05:02:48 nabiki kernel: Disconnected Queue entries:
> > Oct 24 05:02:48 nabiki kernel: QOUTFIFO entries:
> > Oct 24 05:02:48 nabiki kernel: Sequencer Free SCB List: 2 0
> > Oct 24 05:02:48 nabiki kernel: Sequencer SCB Info: 0(c 0x68, s 0x27, l 0,
> > t 0xff) 1(c 0x68, s 0x27, l 0, t 0x0) 2(c 0x68, s 0x27, l 0, t 0xff)
> > Oct 24 05:02:48 nabiki kernel: Pending list: 14(c 0x68, s 0x27, l 0), 0(c
> > 0x68, s 0x27, l 0)
> > Oct 24 05:02:48 nabiki kernel: Kernel Free SCB list: 5 2 9 13 4 3 1 19 7 8
> > 10 6 12 15 18 17 16
> > Oct 24 05:02:48 nabiki kernel: DevQ(0:0:0): 0 waiting
> > Oct 24 05:02:48 nabiki kernel: DevQ(0:2:0): 0 waiting
> > Oct 24 05:02:48 nabiki kernel: DevQ(0:3:0): 0 waiting
> > ...
> > Oct 24 05:04:44 nabiki kernel: scsi0:0:2:0: Cmd aborted from QINFIFO
> > Oct 24 05:04:44 nabiki kernel: aic7xxx_abort returns 0x2002
> > Oct 24 05:04:44 nabiki kernel: scsi: device set offline - not ready or
> > command retry failed after bus reset: host 0 channel 0 id 2 lun 0
> > Oct 24 05:04:44 nabiki kernel: SCSI disk error : host 0 channel 0 id 2 lun
> > 0 return code = 50000
> > Oct 24 05:04:44 nabiki kernel: I/O error: dev 08:11, sector 3949040
> > Oct 24 05:04:44 nabiki kernel: I/O error: dev 08:11, sector 3949048
> > Oct 24 05:04:44 nabiki kernel: SCSI disk error : host 0 channel 0 id 2 lun
> > 0 return code = 3f0000
> > Oct 24 05:04:44 nabiki kernel: I/O error: dev 08:11, sector 4012624
> > Oct 24 05:04:44 nabiki kernel: I/O error: dev 08:11, sector 4012632
> > Oct 24 05:04:44 nabiki kernel: journal-601, buffer write failed
> > Oct 24 05:04:44 nabiki kernel: kernel BUG at prints.c:334!
> > Oct 24 05:04:44 nabiki kernel: invalid operand: 0000
> > Oct 24 05:04:44 nabiki kernel: CPU: 0
> > Oct 24 05:04:44 nabiki kernel: EIP:
> > 0010:[md:__insmod_md_O/lib/modules/2.4.20-k6/kernel/drivers/md/md.o_+-2793383/96]
> >
> > Not tainted
> > Oct 24 05:04:44 nabiki kernel: EFLAGS: 00010282
> > Oct 24 05:04:44 nabiki kernel: eax: 00000024 ebx: d08a8340 ecx:
> > 00000001 edx: 00000001
> > Oct 24 05:04:44 nabiki kernel: esi: c50abc00 edi: c50abc00 ebp:
> > 0000000d esp: c13a3ee0
> > Oct 24 05:04:44 nabiki kernel: ds: 0018 es: 0018 ss: 0018
> > Oct 24 05:04:44 nabiki kernel: Process kupdated (pid: 6,
> > stackpage=c13a3000)
> > Oct 24 05:04:44 nabiki kernel: Stack: d08a67da d08aa420 d08a8340 c13a3f04
> > d0d7ad88 00000000 d089f0be c50abc00
> > Oct 24 05:04:44 nabiki kernel: d08a8340 00000025 00000012 00000010
> > 00000000 d0d7adbc d0d7adb0 0000000e
> > Oct 24 05:04:44 nabiki kernel: 00000000 c77432c0 d08a27be c50abc00
> > d0d7ad88 00000001 c13a3f98 c50abc00
> > Oct 24 05:04:44 nabiki kernel: Call Trace:
> > [md:__insmod_md_O/lib/modules/2.4.20-k6/kernel/drivers/md/md.o_+-2721830/96]
> > [md:__insmod_md_O/lib/modules/2.4.20-k6/kernel/drivers/md/md.o_+-2706400/96]
> > [md:__insmod_md_O/lib/modules/2.4.20-k6/kernel/drivers/md/md.o_+-2714816/96]
> > [md:__insmod_md_O/lib/modules/2.4.20-k6/kernel/drivers/md/md.o_+-2752322/96]
> > [md:__insmod_md_O/lib/modules/2.4.20-k6/kernel/drivers/md/md.o_+-2714816/96]
> > Oct 24 05:04:44 nabiki kernel:
> > [md:__insmod_md_O/lib/modules/2.4.20-k6/kernel/drivers/md/md.o_+-2738242/96]
> > [md:__insmod_md_O/lib/modules/2.4.20-k6/kernel/drivers/md/md.o_+-2741571/96]
> > [md:__insmod_md_O/lib/modules/2.4.20-k6/kernel/drivers/md/md.o_+-2710961/96]
> > [md:__insmod_md_O/lib/modules/2.4.20-k6/kernel/drivers/md/md.o_+-2803643/96]
> > [sync_supers+222/288] [sync_old_buffers+14/68]
> > Oct 24 05:04:44 nabiki kernel: [kupdate+217/252] [kernel_thread+40/56]
> > Oct 24 05:04:44 nabiki kernel:
> > Oct 24 05:04:44 nabiki kernel: Code: 0f 0b 4e 01 e0 67 8a d0 68 20 a4 8a
> > d0 85 f6 74 16 0f b7 46
> >
> >
> >
> >
> >
> > _______________________________________________________
> > Linux Mailing List - http://www.unixtech.be
> > Subscribe/Unsubscribe: http://www.unixtech.be/mailman/listinfo/linux
> > Archives: http://www.mail-archive.com/[EMAIL PROTECTED]
> > IRC: efnet.unixtech.be:6667 - #unixtech
>
_______________________________________________________
Linux Mailing List - http://www.unixtech.be
Subscribe/Unsubscribe: http://www.unixtech.be/mailman/listinfo/linux
Archives: http://www.mail-archive.com/[EMAIL PROTECTED]
IRC: efnet.unixtech.be:6667 - #unixtech