[EMAIL PROTECTED] wrote:
>
> Linuxers,
>
> Thank you for your time and and any help you can offer. Does
> anybody know how to correct an unreliable scsi disk? Has anyone had
> similar experience? How was it solved?
>
> In February, I purchased a 9 G SCSI hard disk and Ultra Wide
> SCSI 2 controller. I upgraded to linux 2.2 and have found the system
> too unreliable for its intended purpose, CDROM generation.
>
> I can duplicate the problem by simply copying a 300K file repeatedly
> to fill the partition, and then comparing each copy to the original. When
> a bash script is used, ~3 out of ~7500 comparisons fail. When perl is
> used, ~6 out of ~7500 comparisons fail.
>
> ---- discoveries ----
> o If the file copy step is skipped, (the same disk image files
> are compared again, the frequency of comparison failures
> remains the same, but the specific files that fail differs.
> (I believe this implies, that reads during comparison are
> failing. I believe this implies that writes during copy
> are error-free.)
> o Using fdisk(8), I have tried setting the end of partition 4 at
> 1106 cylinders and at 1023 cylinders. There appears to be
> no effect. After writing the partition table, I power
> cycled the computer. After recreating the file system
> with mke2fs(8), I power cycled the computer.
> o badblocks(8) write-mode test found no bad blocks.
> o Though not as easily or thoroughly tested, all other partitions
> appear to be reliable.
> ---- discoveries ----
>
>
> ---- duplication algorithm ----
> #! /usr/bin/perl
>
> # Set the following to 0 to use the same disk file images.
> if (1) {
> #
> # Clear memory
> #
> for ($f = 0; $f < 8; $f++) {
> system("rm buffer/suspect.$f*");
> }
>
> #
> # Fill the disk
> #
> for ($a = 0; $a < 8; $a++) {
> for ($b = 0; $b < 10; $b++) {
> for ($c = 0; $c < 10; $c++) {
> for ($d = 0; $d < 10; $d++) {
> system("cp random300K.bin buffer/suspect.$a$b$c$d");
> }
> }
> }
> }
> }
>
> #
> # Verify the copy
> #
> @bad = ();
> for ($a = 0; $a < 6; $a++) {
> for ($b = 0; $b < 10; $b++) {
> for ($c = 0; $c < 10; $c++) {
> for ($d = 0; $d < 10; $d++) {
> if ("$a$b$c$d" < 5813) {
> if (system("cmp -s random300K.bin buffer/suspect.$a$b$c$d")) {
> push @bad, "$a$b$c$d";
> print "Bad block: $a$b$c$d\n";
> }
> }
> }
> }
> }
> }
>
> #
> # Record the bad blocks
> #
> $count = scalar(@bad);
> system("echo $count >> /mnt/scsi_000p2/tmp/count");
>
> #
> # Mark bad blocks
> #
> foreach $b (@bad) {
>
>local($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,$atime,$mtime,$ctime,$blksize,$blocks)
> = stat("buffer/suspect.$b");
> print "Renaming buffer/suspect.$b -> /mnt/scsi_000p4/bad_blocks/$ino\n";
> rename "buffer/suspect.$b","/mnt/scsi_000p4/bad_blocks/$ino";
> }
> ---- duplication algorithm ----
>
> ---- partition table ----
> Disk /dev/sda: 255 heads, 63 sectors, 1106 cylinders
> Units = cylinders of 16065 * 512 bytes
>
> Device Boot Begin Start End Blocks Id System
> /dev/sda1 1 1 368 2955928+ 6 DOS 16-bit >=32M
> /dev/sda2 * 369 369 736 2955960 83 Linux native
> /dev/sda3 737 737 744 64260 82 Linux swap
> /dev/sda4 745 745 1023 2241067+ 83 Linux native
> ---- partition table ----
>
> ---- platform ----
> Platform: Dell Dimension XPS 133c
> CPU: Pentium 133MHz, 256K cache
> OS: linux-2.2.1
> scsi0 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.10/3.2.4
> <Adaptec AHA-294X Ultra2 SCSI host adapter>
> scsi1 : SCSI host adapter emulation for IDE ATAPI devices
> scsi : 2 hosts.
> Vendor: SEAGATE Model: ST39173LW Rev: 6246
> Type: Direct-Access ANSI SCSI revision: 02
> Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
> Vendor: HP Model: CD-Writer+ 8100 Rev: 1.0g
> Type: CD-ROM ANSI SCSI revision: 02
> ---- platform ----
>
> --
> Dr. Robert J. Meier
> 1-248-650-9488
> [EMAIL PROTECTED]
This is not an answer to your problem, but may help elicit an answer. A
few days ago a problem of SIG11 errors was presented on this list, and
someone provided a link (I thought I had saved it but can not locate it
now). There was more information on this site than just SIG errors
related to hardware generated random errors which seems to be what you
are experiencing. If the original poster of this link would _please_
repost, you may find some help at that site.
Ralph