On Tue, 28 Oct 2003, Jean-Francois Dive wrote:

> as tu ksymoops ? Backup to fichier log dans /var/log/ksymoops/`date`
> et decode les tracebacks. Quel kernel est ce ? standard debian ? Si oui,
> envoie un mail a [EMAIL PROTECTED] et demande lui si il sais se que
> c'est. Probleme dans le module raid, mais est ce a cause de l'erreur
> scsi qui n'est pas bien handelee ou est ce vraiment un probleme de LVM,
> telle est la question. Perso, je ne connais vraiment pas bien le layer
> FS, donc, ji ni sais nin trop.

C'est une debian testing avec un 2.4.20-k6. J'ai ksymoops, mais dans le 
log d'hier, je ne vois rien me mettant sur une piste. J'ai pas mal de 
<date><time>.ksyms et.modules mais je ne vois pas d'erreurs dedans. Voil� 
la fin d'un dmesg � l'instant:

journal-601, buffer write failed
kernel BUG at prints.c:334!
invalid operand: 0000
CPU:    0
EIP:    0010:[<d0895059>]    Not tainted
EFLAGS: 00010282
eax: 00000024   ebx: d08a8340   ecx: 00000001   edx: 00000001
esi: c28a2000   edi: c28a2000   ebp: 00000009   esp: c13a3ee0
ds: 0018   es: 0018   ss: 0018
Process kupdated (pid: 6, stackpage=c13a3000)
Stack: d08a67da d08aa420 d08a8340 c13a3f04 d0d7b32c 00000000 d089f0be 
c28a2000
       d08a8340 00000038 00000012 00000010 00000000 d0d7b360 d0d7b354 
0000000a
       00000000 c7896ce0 d08a27be c28a2000 d0d7b32c 00000001 c13a3f98 
c28a2000
Call Trace:    [<d08a67da>] [<d08aa420>] [<d08a8340>] [<d089f0be>] 
[<d08a8340>]
  [<d08a27be>] [<d08a1abd>] [<d08a924f>] [<d0892845>] [<c01355a2>] 
[<c0134bd2>]
  [<c0134e35>] [<c010705c>]
 
Code: 0f 0b 4e 01 e0 67 8a d0 68 20 a4 8a d0 85 f6 74 16 0f b7 46
  I/O error: dev 08:11, sector 4042256

Mais les disques eux-m�me sont OK quand j'avais fais le mkreiserfs, pas 
d'erreurs.

> 
> On Mon, 2003-10-27 at 18:58, Vincent Jamart wrote:
> > Mon server @home a crash� avec ca dans kern.log. Ce sont les disques SCSI 
> > avec le LVM dessus qui en sont la cause, il semble. C'est comme s'ils se 
> > mettaient en veille sans jamais revenir � un IRQ. c'est la 2e fois que ca arrive 
> > en 2 
> > semaines. Il faut noter que le 5e device sur la chaine SCSI est un tape 
> > 8MM que je power off apr�s le backup de la semaine (il a le terminateur actif). 
> > 
> > Si je le boot normalement apr�s un clean shutdown, 
> > lvm ne voit pas de VG au scan, je suis oblig� de restaurer le VGDA sur 
> > chaque HDD (/dev/sdaN1). Apr�s ca, un vgscan voit tout le VG et les lv 
> > sont actifs. Je peux alors travailler normalement sur les lv en reiserfs 
> > (resize, etc marchent bien):
> > 
> > vgscan
> > vgdisplay -- no volume groups found
> > vgcfgrestore -f /etc/lvmconf/doc_vg.conf -n doc_vg /dev/sda1
> > vgcfgrestore -- VGDA for "doc_vg" successfully restored to physical volume 
> > "/dev/sda1"
> > ...
> > vgscan
> > vgchange -ay 
> > vgchange -- volume group "doc_vg" successfully activated
> > [EMAIL PROTECTED]:/LOG# vgdisplay
> > --- Volume group ---
> > VG Name               doc_vg
> > VG Access             read/write
> > VG Status             available/resizable
> > VG #                  0
> > MAX LV                256
> > Cur LV                2
> > Open LV               0
> > MAX LV Size           2 TB
> > Max PV                256
> > Cur PV                4
> > Act PV                4
> > VG Size               16.62 GB
> > PE Size               32 MB
> > Total PE              532
> > Alloc PE / Size       288 / 9 GB
> > Free  PE / Size       244 / 7.62 GB
> > VG UUID               gcyqUX-080P-l7px-arts-BEA3-v1UG-YH9ec2
> > 
> > [EMAIL PROTECTED]:/LOG# mount -a
> > [EMAIL PROTECTED]:/LOG# df
> > Filesystem           1K-blocks      Used Available Use% Mounted on
> > /dev/hda1              1269056   1151896    117160  91% /
> > /dev/hda2              2947828   1141336   1806492  39% /data
> > /dev/hdb2               814432     32896    781536   5% /data/ftp
> > /dev/doc_vg/lv_cd01    6291260   2599396   3691864  42% 
> > /data/www/documentation
> > /dev/doc_vg/lv_pg_data
> >                        3145628    327912   2817716  11% 
> > /var/lib/postgres/data
> > 
> > et l� OK.
> > 
> > C'est un premier probl�me mais non bloquant � r�soudre ASAP . Le crash de 
> > la semaine derni�re �tait lorsque j'ai ajout� des PP � un LV: un des 4 disques 
> > avait tous ses pp libres 
> > (sdc1) et lors du resize, crash/bang. Apr�s avoir demount� tout et retir� 
> > les modules scsi et lvm, j'ai stopp� ma tour SCSI et remis ON, refais un 
> > modprobe et tout remont� (apr�s le restore du VGDA malgr� tout) et il a 
> > alors pu agrandir le LV et le reiserfs, comme si le disque sdc �tait OK. 
> > Ce matin, rebelotte mais lors d'I/O sur fichiers.
> > 
> > Il y a 4 disques de 4Gb venant de pSeries en rade et un tape Exabyte 8mm.
> > 
> > Voil� le dump (sorry de la taille), si vous avez d�ja eu le cas... Je 
> > cherche de mon c�t� ce soir:
> >  
> > Oct 24 05:02:38 nabiki kernel: scsi0:0:2:0: Attempting to queue an ABORT 
> > message
> > Oct 24 05:02:38 nabiki kernel: scsi0: Dumping Card State in Data-in phase, 
> > at SEQADDR 0x9d
> > Oct 24 05:02:38 nabiki kernel: ACCUM = 0x0, SINDEX = 0x8, DINDEX = 0x8f, 
> > ARG_2 = 0xff
> > Oct 24 05:02:38 nabiki kernel: HCNT = 0x0 SCBPTR = 0x1
> > Oct 24 05:02:38 nabiki kernel: SCSISEQ = 0x12, SBLKCTL = 0x0
> > Oct 24 05:02:38 nabiki kernel:  DFCNTRL = 0x0, DFSTATUS = 0x28
> > Oct 24 05:02:38 nabiki kernel: LASTPHASE = 0x40, SCSISIGI = 0x44, SXFRCTL0 
> > = 0xa8
> > Oct 24 05:02:38 nabiki kernel: SSTAT0 = 0x7, SSTAT1 = 0x2
> > Oct 24 05:02:38 nabiki kernel: STACK == 0x9a, 0x19b, 0x15a, 0x0
> > Oct 24 05:02:38 nabiki kernel: SCB count = 20
> > Oct 24 05:02:38 nabiki kernel: Kernel NEXTQSCB = 5
> > Oct 24 05:02:38 nabiki kernel: Card NEXTQSCB = 11
> > Oct 24 05:02:38 nabiki kernel: QINFIFO entries: 11
> > Oct 24 05:02:38 nabiki kernel: Waiting Queue entries:
> > Oct 24 05:02:38 nabiki kernel: Disconnected Queue entries:
> > Oct 24 05:02:38 nabiki kernel: QOUTFIFO entries:
> > Oct 24 05:02:38 nabiki kernel: Sequencer Free SCB List: 2 0
> > Oct 24 05:02:38 nabiki kernel: Sequencer SCB Info: 0(c 0x68, s 0x27, l 0, 
> > t 0xff) 1(c 0x68, s 0x27, l 0, t 0x0) 2(c 0x68, s 0x27, l 0, t 0xff)
> > Oct 24 05:02:38 nabiki kernel: Pending list: 11(c 0x68, s 0x27, l 0), 0(c 
> > 0x68, s 0x27, l 0)
> > Oct 24 05:02:38 nabiki kernel: Kernel Free SCB list: 14 2 9 13 4 3 1 19 7 
> > 8 10 6 12 15 18 17 16
> > Oct 24 05:02:38 nabiki kernel: DevQ(0:0:0): 0 waiting
> > Oct 24 05:02:38 nabiki kernel: DevQ(0:2:0): 0 waiting
> > Oct 24 05:02:38 nabiki kernel: DevQ(0:3:0): 0 waiting
> > Oct 24 05:02:38 nabiki kernel: DevQ(0:5:0): 0 waiting
> > Oct 24 05:02:38 nabiki kernel: DevQ(0:6:0): 0 waiting
> > Oct 24 05:02:38 nabiki kernel: scsi0:0:2:0: Device is active, asserting 
> > ATN
> > Oct 24 05:02:38 nabiki kernel: Recovery code sleeping
> > Oct 24 05:02:38 nabiki kernel: Recovery code awake
> > Oct 24 05:02:38 nabiki kernel: aic7xxx_abort returns 0x2002
> > Oct 24 05:02:48 nabiki kernel: scsi0:0:2:0: Attempting to queue an ABORT 
> > message
> > Oct 24 05:02:48 nabiki kernel: scsi0: Dumping Card State in Data-in phase, 
> > at SEQADDR 0x9d
> > Oct 24 05:02:48 nabiki kernel: ACCUM = 0x0, SINDEX = 0x8, DINDEX = 0x8f, 
> > ARG_2 = 0xff
> > Oct 24 05:02:48 nabiki kernel: HCNT = 0x0 SCBPTR = 0x1
> > Oct 24 05:02:48 nabiki kernel: SCSISEQ = 0x12, SBLKCTL = 0x0
> > Oct 24 05:02:48 nabiki kernel:  DFCNTRL = 0x0, DFSTATUS = 0x28
> > Oct 24 05:02:48 nabiki kernel: LASTPHASE = 0x40, SCSISIGI = 0x54, SXFRCTL0 
> > = 0xa8
> > Oct 24 05:02:48 nabiki kernel: SSTAT0 = 0x7, SSTAT1 = 0x2
> > Oct 24 05:02:48 nabiki kernel: STACK == 0x9a, 0x19b, 0x15a, 0x0
> > Oct 24 05:02:48 nabiki kernel: SCB count = 20
> > Oct 24 05:02:48 nabiki kernel: Kernel NEXTQSCB = 14
> > Oct 24 05:02:48 nabiki kernel: Card NEXTQSCB = 11
> > Oct 24 05:02:48 nabiki kernel: QINFIFO entries: 11 5
> > Oct 24 05:02:48 nabiki kernel: Waiting Queue entries:
> > Oct 24 05:02:48 nabiki kernel: Disconnected Queue entries:
> > Oct 24 05:02:48 nabiki kernel: QOUTFIFO entries:
> > Oct 24 05:02:48 nabiki kernel: Sequencer Free SCB List: 2 0
> > Oct 24 05:02:48 nabiki kernel: Sequencer SCB Info: 0(c 0x68, s 0x27, l 0, 
> > t 0xff) 1(c 0x68, s 0x27, l 0, t 0x0) 2(c 0x68, s 0x27, l 0, t 0xff)
> > Oct 24 05:02:48 nabiki kernel: Pending list: 5(c 0x68, s 0x27, l 0), 11(c 
> > 0x68, s 0x27, l 0), 0(c 0x68, s 0x27, l 0)
> > Oct 24 05:02:48 nabiki kernel: Kernel Free SCB list: 2 9 13 4 3 1 19 7 8 
> > 10 6 12 15 18 17 16
> > Oct 24 05:02:48 nabiki kernel: DevQ(0:0:0): 0 waiting
> > Oct 24 05:02:48 nabiki kernel: DevQ(0:2:0): 0 waiting
> > Oct 24 05:02:48 nabiki kernel: DevQ(0:3:0): 0 waiting
> > Oct 24 05:02:48 nabiki kernel: DevQ(0:5:0): 0 waiting
> > Oct 24 05:02:48 nabiki kernel: DevQ(0:6:0): 0 waiting
> > Oct 24 05:02:48 nabiki kernel: scsi0:0:2:0: Cmd aborted from QINFIFO
> > Oct 24 05:02:48 nabiki kernel: aic7xxx_abort returns 0x2002
> > Oct 24 05:02:48 nabiki kernel: scsi0:0:2:0: Attempting to queue an ABORT 
> > message
> > Oct 24 05:02:48 nabiki kernel: scsi0: Dumping Card State in Data-in phase, 
> > at SEQADDR 0x9d
> > Oct 24 05:02:48 nabiki kernel: ACCUM = 0x0, SINDEX = 0x8, DINDEX = 0x8f, 
> > ARG_2 = 0xff
> > Oct 24 05:02:48 nabiki kernel: HCNT = 0x0 SCBPTR = 0x1
> > Oct 24 05:02:48 nabiki kernel: SCSISEQ = 0x12, SBLKCTL = 0x0
> > Oct 24 05:02:48 nabiki kernel:  DFCNTRL = 0x0, DFSTATUS = 0x28
> > Oct 24 05:02:48 nabiki kernel: LASTPHASE = 0x40, SCSISIGI = 0x54, SXFRCTL0 
> > = 0xa8
> > Oct 24 05:02:48 nabiki kernel: SSTAT0 = 0x7, SSTAT1 = 0x2
> > Oct 24 05:02:48 nabiki kernel: STACK == 0x9a, 0x19b, 0x15a, 0x0
> > Oct 24 05:02:48 nabiki kernel: SCB count = 20
> > Oct 24 05:02:48 nabiki kernel: Kernel NEXTQSCB = 11
> > Oct 24 05:02:48 nabiki kernel: Card NEXTQSCB = 14
> > Oct 24 05:02:48 nabiki kernel: QINFIFO entries: 14
> > Oct 24 05:02:48 nabiki kernel: Waiting Queue entries:
> > Oct 24 05:02:48 nabiki kernel: Disconnected Queue entries:
> > Oct 24 05:02:48 nabiki kernel: QOUTFIFO entries:
> > Oct 24 05:02:48 nabiki kernel: Sequencer Free SCB List: 2 0
> > Oct 24 05:02:48 nabiki kernel: Sequencer SCB Info: 0(c 0x68, s 0x27, l 0, 
> > t 0xff) 1(c 0x68, s 0x27, l 0, t 0x0) 2(c 0x68, s 0x27, l 0, t 0xff)
> > Oct 24 05:02:48 nabiki kernel: Pending list: 14(c 0x68, s 0x27, l 0), 0(c 
> > 0x68, s 0x27, l 0)
> > Oct 24 05:02:48 nabiki kernel: Kernel Free SCB list: 5 2 9 13 4 3 1 19 7 8 
> > 10 6 12 15 18 17 16
> > Oct 24 05:02:48 nabiki kernel: DevQ(0:0:0): 0 waiting
> > Oct 24 05:02:48 nabiki kernel: DevQ(0:2:0): 0 waiting
> > Oct 24 05:02:48 nabiki kernel: DevQ(0:3:0): 0 waiting
> > ...
> > Oct 24 05:04:44 nabiki kernel: scsi0:0:2:0: Cmd aborted from QINFIFO
> > Oct 24 05:04:44 nabiki kernel: aic7xxx_abort returns 0x2002
> > Oct 24 05:04:44 nabiki kernel: scsi: device set offline - not ready or 
> > command retry failed after bus reset: host 0 channel 0 id 2 lun 0
> > Oct 24 05:04:44 nabiki kernel: SCSI disk error : host 0 channel 0 id 2 lun 
> > 0 return code = 50000
> > Oct 24 05:04:44 nabiki kernel:  I/O error: dev 08:11, sector 3949040
> > Oct 24 05:04:44 nabiki kernel:  I/O error: dev 08:11, sector 3949048
> > Oct 24 05:04:44 nabiki kernel: SCSI disk error : host 0 channel 0 id 2 lun 
> > 0 return code = 3f0000
> > Oct 24 05:04:44 nabiki kernel:  I/O error: dev 08:11, sector 4012624
> > Oct 24 05:04:44 nabiki kernel:  I/O error: dev 08:11, sector 4012632
> > Oct 24 05:04:44 nabiki kernel: journal-601, buffer write failed
> > Oct 24 05:04:44 nabiki kernel: kernel BUG at prints.c:334!
> > Oct 24 05:04:44 nabiki kernel: invalid operand: 0000
> > Oct 24 05:04:44 nabiki kernel: CPU:    0
> > Oct 24 05:04:44 nabiki kernel: EIP:    
> > 0010:[md:__insmod_md_O/lib/modules/2.4.20-k6/kernel/drivers/md/md.o_+-2793383/96]  
> >   
> > Not tainted
> > Oct 24 05:04:44 nabiki kernel: EFLAGS: 00010282
> > Oct 24 05:04:44 nabiki kernel: eax: 00000024   ebx: d08a8340   ecx: 
> > 00000001   edx: 00000001
> > Oct 24 05:04:44 nabiki kernel: esi: c50abc00   edi: c50abc00   ebp: 
> > 0000000d   esp: c13a3ee0
> > Oct 24 05:04:44 nabiki kernel: ds: 0018   es: 0018   ss: 0018
> > Oct 24 05:04:44 nabiki kernel: Process kupdated (pid: 6, 
> > stackpage=c13a3000)
> > Oct 24 05:04:44 nabiki kernel: Stack: d08a67da d08aa420 d08a8340 c13a3f04 
> > d0d7ad88 00000000 d089f0be c50abc00
> > Oct 24 05:04:44 nabiki kernel:        d08a8340 00000025 00000012 00000010 
> > 00000000 d0d7adbc d0d7adb0 0000000e
> > Oct 24 05:04:44 nabiki kernel:        00000000 c77432c0 d08a27be c50abc00 
> > d0d7ad88 00000001 c13a3f98 c50abc00
> > Oct 24 05:04:44 nabiki kernel: Call Trace:    
> > [md:__insmod_md_O/lib/modules/2.4.20-k6/kernel/drivers/md/md.o_+-2721830/96] 
> > [md:__insmod_md_O/lib/modules/2.4.20-k6/kernel/drivers/md/md.o_+-2706400/96] 
> > [md:__insmod_md_O/lib/modules/2.4.20-k6/kernel/drivers/md/md.o_+-2714816/96] 
> > [md:__insmod_md_O/lib/modules/2.4.20-k6/kernel/drivers/md/md.o_+-2752322/96] 
> > [md:__insmod_md_O/lib/modules/2.4.20-k6/kernel/drivers/md/md.o_+-2714816/96]
> > Oct 24 05:04:44 nabiki kernel:   
> > [md:__insmod_md_O/lib/modules/2.4.20-k6/kernel/drivers/md/md.o_+-2738242/96] 
> > [md:__insmod_md_O/lib/modules/2.4.20-k6/kernel/drivers/md/md.o_+-2741571/96] 
> > [md:__insmod_md_O/lib/modules/2.4.20-k6/kernel/drivers/md/md.o_+-2710961/96] 
> > [md:__insmod_md_O/lib/modules/2.4.20-k6/kernel/drivers/md/md.o_+-2803643/96] 
> > [sync_supers+222/288] [sync_old_buffers+14/68]
> > Oct 24 05:04:44 nabiki kernel:   [kupdate+217/252] [kernel_thread+40/56]
> > Oct 24 05:04:44 nabiki kernel:
> > Oct 24 05:04:44 nabiki kernel: Code: 0f 0b 4e 01 e0 67 8a d0 68 20 a4 8a 
> > d0 85 f6 74 16 0f b7 46
> > 
> > 
> > 
> > 
> > 
> > _______________________________________________________
> > Linux Mailing List - http://www.unixtech.be
> > Subscribe/Unsubscribe: http://www.unixtech.be/mailman/listinfo/linux
> > Archives: http://www.mail-archive.com/[EMAIL PROTECTED]
> > IRC: efnet.unixtech.be:6667 - #unixtech
> 




_______________________________________________________
Linux Mailing List - http://www.unixtech.be
Subscribe/Unsubscribe: http://www.unixtech.be/mailman/listinfo/linux
Archives: http://www.mail-archive.com/[EMAIL PROTECTED]
IRC: efnet.unixtech.be:6667 - #unixtech

Répondre à