this is networking bug in 2.2.11

upgrade kernel to 2.2.14, get new raid patch and raid tools from

www.redhat.com/~mingo/

allan

Bernd Burgstaller <[EMAIL PROTECTED]> said:

> Dear all!
> 
> I am writing this mail due to hangups related to my raid devices. I am
> seeking for suggestions enabling me to locate the problem. Any suggestions
> are welcome! Below you find a description of my system as well as of the 
> problems. If you need further information, please let me know.
> 
> Best regards & thanks in advance,
> Bernd Burgstaller
> 
> 
> 0.0 Definitions
> 
> 
> In the following I use the term 'rescue system' for a non-raid linux
installation.
> 'Production system' denotes the linux installation on raid devices.
> 
> 
> 1.0 Symptoms
> 
> 
> The symptoms are always the same: after some uptime (30 seconds up to
several
> hours), the system locks. With X, this results in a frozen screen, switching
> to a textconsole is not possible. Without X, it is sometimes possible to
> switch to other textconsoles and type at the corresponding login prompt.
> However, after pressing return at the login prompt, that console is locked,
> too.
> >From outside the locked system is often still ping-able. When telneting to
the
> locked system, a login prompt occurs for the first telnet attempt, after
> entering a login and pressing return, the session times out. Further telnet
> attempts are refused by the locked system.
> In general TCP connects aren't possible anymore. They are refused by the
kernel,
> e.g. rpcinfo -p HOST
> 
> 2.0 Disks
> 
> 
> The system contains 3 SCSI disks, detected by the kernel as follows:
> 
> (scsi0) <Adaptec AIC-7890/1 Ultra2 SCSI host adapter> found at PCI 6/0
> (scsi0) Wide Channel, SCSI ID=7, 32/255 SCBs
> (scsi0) Downloading sequencer code... 374 instructions downloaded
> scsi0 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.19/3.2.4
>        <Adaptec AIC-7890/1 Ultra2 SCSI host adapter>
> scsi : 1 host.
> (scsi0:0:0:0) Synchronous at 80.0 Mbyte/sec, offset 31.
>   Vendor: IBM       Model: DNES-309170W      Rev: SA30
>   Type:   Direct-Access                      ANSI SCSI revision: 03
> Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
> (scsi0:0:2:0) Synchronous at 20.0 Mbyte/sec, offset 15.
>   Vendor: PLEXTOR   Model: CD-ROM PX-40TS    Rev: 1.01
>   Type:   CD-ROM                             ANSI SCSI revision: 02
> Detected scsi CD-ROM sr0 at scsi0, channel 0, id 2, lun 0
> (scsi0:0:3:0) Synchronous at 10.0 Mbyte/sec, offset 32.
>   Vendor: HP        Model: C1537A            Rev: L708
>   Type:   Sequential-Access                  ANSI SCSI revision: 02
> Detected scsi tape st0 at scsi0, channel 0, id 3, lun 0
> (scsi0:0:13:0) Synchronous at 80.0 Mbyte/sec, offset 31.
>   Vendor: IBM       Model: DNES-309170W      Rev: SA30
>   Type:   Direct-Access                      ANSI SCSI revision: 03
> Detected scsi disk sdb at scsi0, channel 0, id 13, lun 0
> (scsi0:0:14:0) Synchronous at 80.0 Mbyte/sec, offset 31.
>   Vendor: IBM       Model: DNES-309170W      Rev: SA30
>   Type:   Direct-Access                      ANSI SCSI revision: 03
> Detected scsi disk sdc at scsi0, channel 0, id 14, lun 0
> scsi : detected 1 SCSI tape 1 SCSI cdrom 3 SCSI disks total.
> Uniform CDROM driver Revision: 2.55
> SCSI device sda: hdwr sector= 512 bytes. Sectors= 17916240 [8748 MB] [8.7
GB]
> SCSI device sdb: hdwr sector= 512 bytes. Sectors= 17916240 [8748 MB] [8.7
GB]
> SCSI device sdc: hdwr sector= 512 bytes. Sectors= 17916240 [8748 MB] [8.7
GB]
> 
> 
> 2.1 Disk Partitions
> 
> 
> The following figures are reported by fdisk's p(rint) statement:
> 
> Disk /dev/sda: 255 heads, 63 sectors, 1115 cylinders
> Units = cylinders of 16065 * 512 bytes
> 
>    Device Boot    Start       End    Blocks   Id  System
> /dev/sda1             1       128   1028128+  fd  Linux raid autodetect
> /dev/sda2           129       146    144585   fd  Linux raid autodetect
> /dev/sda3           147       529   3076447+  fd  Linux raid autodetect
> /dev/sda4           530      1115   4707045    5  Extended
> /dev/sda5           530      1039   4096543+  fd  Linux raid autodetect
> /dev/sda6          1040      1115    610438+  82  Linux swap
> 
> Disk /dev/sdb: 255 heads, 63 sectors, 1115 cylinders
> Units = cylinders of 16065 * 512 bytes
> 
>    Device Boot    Start       End    Blocks   Id  System
> /dev/sdb1             1       128   1028128+  fd  Linux raid autodetect
> /dev/sdb2           129       146    144585   fd  Linux raid autodetect
> /dev/sdb3           147       529   3076447+  fd  Linux raid autodetect
> /dev/sdb4           530      1115   4707045    5  Extended
> /dev/sdb5           530      1039   4096543+  fd  Linux raid autodetect
> /dev/sdb6          1040      1115    610438+  83  Linux
> 
> Disk /dev/sdc: 255 heads, 63 sectors, 1115 cylinders
> Units = cylinders of 16065 * 512 bytes
> 
>    Device Boot    Start       End    Blocks   Id  System
> /dev/sdc1             1       128   1028128+  82  Linux swap
> /dev/sdc2           129       256   1028160   83  Linux
> /dev/sdc3           257      1039   6289447+  83  Linux
> /dev/sdc4          1040      1115    610470   83  Linux
> 
> 
> 2.2 Relation Partitions - MD Devices
> 
> 
>
--------------------------------------------------------------------------------
>  Disk 1              Raid Dev  System     Size      Mnt(Prod)  Mnt(Resc)  Notes   
>
--------------------------------------------------------------------------------
>  /dev/sda1     /dev/md1  raid1-0-0  1.000GB   /var
>  /dev/sda2     /dev/md0  raid1-2-0  0.140GB   /
>  /dev/sda3     /dev/md2  raid1-3-0  3.000GB   /usr
>  /dev/sda5     /dev/md3  raid1-4-0  4.000GB   /home
>  /dev/sda6               swap       0.600GB    none      none      
Swapspace 2   
>
---------------------------------------------------------------------------------
> 
> 
>
---------------------------------------------------------------------------------
>  Disk 2              Raid Dev  System     Size      Mnt(Prod)  Mnt(Resc)  Notes   
>
---------------------------------------------------------------------------------
>  /dev/sdb1     /dev/md1  raid1-0-1  1.000GB   /var
>  /dev/sdb2     /dev/md0  raid1-2-1  0.140GB   /
>  /dev/sdb3     /dev/md2  raid1-3-1  3.000GB   /usr
>  /dev/sdb5     /dev/md3  raid1-4-1  4.000GB   /home
>  /dev/sdb6               ext2       0.600GB              /          Rescue
Mirror   
>
---------------------------------------------------------------------------------
> 
> 
>
---------------------------------------------------------------------------------
>  Disk 3        Raid Dev  System   Size              Mnt(Prod)  Mnt(Resc)  Notes 
>
---------------------------------------------------------------------------------
>  /dev/sdc1               swap     1.000GB     none       none      
Swapspace 1
>  /dev/sdc2               ext2     1.000GB     /tmp       /tmp      
temporary data
>  /dev/sdc3               ext2     6.140GB     /V                    VMWare,
..
>  /dev/sdc4               ext2     0.600GB                /          Rescue
System
>
---------------------------------------------------------------------------------
> 
> 
> 2.3 Raid Configuration
> 
> 
> Here's my /etc/raidtab:
> 
> raiddev /dev/md1
>         raid-level      1
>         nr-raid-disks   2
>         nr-spare-disks  0
>         chunk-size      4
>         persistent-superblock   1
>         device          /dev/sda1
>         raid-disk       0
>         device          /dev/sdb1
>         raid-disk       1
> 
> raiddev /dev/md0
>         raid-level      1
>         nr-raid-disks   2
>         nr-spare-disks  0
>         chunk-size      4
>         persistent-superblock   1
>         device          /dev/sda2
>         raid-disk       0
>         device          /dev/sdb2
>         raid-disk       1
> 
> raiddev /dev/md2
>         raid-level      1
>         nr-raid-disks   2
>         nr-spare-disks  0
>         chunk-size      4
>         persistent-superblock   1
>         device          /dev/sda3
>         raid-disk       0
>         device          /dev/sdb3
>         raid-disk       1
> 
> raiddev /dev/md3
>         raid-level      1
>         nr-raid-disks   2
>         nr-spare-disks  0
>         chunk-size      4
>         persistent-superblock   1
>         device          /dev/sda5
>         raid-disk       0
>         device          /dev/sdb5
>         raid-disk       1
> 
> 
> And the output of cat /proc/mdstat:
> 
> guldin:~ # cat /proc/mdstat
> Personalities : [linear] [raid0] [raid1] [raid5] [translucent] 
> read_ahead 1024 sectors
> md1 : active raid1 sdb1[1] sda1[0] 1028032 blocks [2/2] [UU]
> md0 : active raid1 sdb2[1] sda2[0] 144512 blocks [2/2] [UU]
> md2 : active raid1 sdb3[1] sda3[0] 3076352 blocks [2/2] [UU]
> md3 : active raid1 sdb5[1] sda5[0] 4096448 blocks [2/2] [UU]
> unused devices: <none>
> 
> 
> 2.4 Fstab
> 
> 
> File /etc/fstab depicts how the md devices make up the filesystem of the
> production system.
> 
> /dev/sdc1       swap                      swap            defaults   0   0
> /dev/sda6       swap                      swap            defaults   0   0
> /dev/sdc2       /tmp                      ext2            defaults   1   2
> /dev/sdc3       /V                        ext2            defaults   1   2
> /dev/md0        /                         ext2            defaults   1   1
> /dev/md2        /usr                      ext2            defaults   1   2
> /dev/md1        /var                      ext2            defaults   1   2
> /dev/md3        /home                     ext2            defaults   1   2
> 
> /dev/scd0       /cdrom                    iso9660        
ro,noauto,user,exec 0   0
> 
> /dev/fd0        /floppy                   auto            noauto,user 0   0
> 
> none            /proc                     proc            defaults   0   0
> # End of YaST-generated fstab lines
> 
> 
> 3.0 Kernel
> 
> 
> I used a stock 2.2.11 kernel, patched it with raid0145-19990824-2.2.11 which
I got
> from ftp://ftp.fi.kernel.org/pub/linux/daemons/raid/alpha. The raidtools
from the
> same location are raidtools-19990824-0.90.tar.gz 
> 
> Attached to this mail is the kernel configuration I used.
> 
> 
> 4.0 Installation
> 
> 
> Suse 6.3 is the used Linux distribution.
> 
> Initially I installed the rescue system on /dev/sdc4. On that I patched the
> 2.2.11 kernel, enabled raid support, compiled, and booted the rescue system
> with that kernel.
> 
> Next I partitioned /dev/sda and /dev/sdb according Section 2. Mkraid enabled
> the md devices.
> 
> >From the rescue system I mounted the md devices under /mnt according to the
fs
> structure given in Section 2.4. 
> 
> Finally I installed the production system into the /mnt dir, then changed
the
> yast-generated fstab file to match the actual situation.
> 
> 
> 5.0 Booting
> 
> 
> Initially I booted the production system from a boot floppy. However, after
> getting the Red Hat patch for lilo V21 I attempted to install lilo in the
MBR.
> Since my /boot dir is also on a md device, this did not work out, because
> lilo did not know how to handle a 0x90 device (the kernel seems to report
> md devices as 0x9.. instead of 0x8..). I changed lilo to accept 0x9.. and
> do the same as if it where a 0x8.. device. I admit that this was a quick
> hack, but note that I experienced the first hang up of the system before
> fiddling with lilo.
> 
> Below is my lilo.conf
> 
> # Start LILO global Section
> boot=/dev/md0
> linear
> #compact       # faster, but won't work on all systems.
> vga=normal
> read-only
> prompt
> timeout=100
> # End LILO global Section
> #
> image = /boot/vmlinuz.raid
>   root = /dev/md0
>   label = linux
> 
> And lilo's comments on it:
> 
> boot = /dev/sda, map = /boot/map.0802
> Added linux *
> boot = /dev/sdb, map = /boot/map.0812
> Added linux
> 
> 
> 6.0 Diagnostics
> 
> 
> In order to track the problem I enabled syslogd *.* logging on a non-raid
device
> as well as logging over the net to another machine. However, there are no
more
> logs generated as soon as the system locks :-(
> 
> In my opinion this could have two reasons:
> 
> (1) Either the system is so damaged that even the logging mechanisms are
broken.
> 
>     Is there a general strategy for logging in such weird cases? Did anybody
>     every try to directly access a tty from within the kernel to get through
>     the logs via a serial line and capture it on another computer?
> 
> (2) The reason for the lock generates no log entry. I do not know whether
there
>     are some more compile-switch dependent printk's waiting in the kernel.
Hints
>     on kernel switches that make the kernel more verbose would be
appreciated.
>     Furthermore I could add further printk's at interesting places,
suggestions
>     welcome.
> 
> 
> 7.0 Notes
> 
> 
> 7.1 Never did I experience a hang up of the rescue system (despite having it
>     compile kernel all night :-) For that reason I suspect that the locks
are
>     somehow related to raid.
> 
> 
> 7.2 Despite the fact that I experienced locks without amd, usage of amd
locks
>     the system within minutes. However I have to admit that I did not try
amd
>     on the rescue system...
>     Note that I have disabled kernel autofs support.
>  
> 
> 



Reply via email to