My I/O exerciser will occassionally "hang" under certain subsystem error
inject scenarios.  It can run OK for anywhere between 0 and 38 attempts
before failing.

The only thing different that I noticed on the failed attempts is that
there is a  "scsidisk I/O error" entry in /var/log/messages that does not
appear on the successful error inject attempts:

   Apr 30 07:43:26 linux1 kernel: SCSI host 2 abort (pid 707672569) timed
out - resetting
   Apr 30 07:43:26 linux1 kernel: SCSI bus is being reset for host 2
channel 0.
   Apr 30 07:44:28 linux1 kernel: SCSI disk error : host 2 channel 0 id 6
lun 6 return code = 26030000
-->Apr 30 07:44:28 linux1 kernel: scsidisk I/O error: dev 08:71, sector
23801000

Grep-ing in /usr/src/linux/drivers/scsi found where the message came from
in scsi.h:

   static Scsi_Cmnd * end_scsi_request(Scsi_Cmnd * SCpnt, int uptodate, int
sectors)
   {
       struct request * req;
       struct buffer_head * bh;

       req = &SCpnt->request;
       req->errors = 0;
       if (!uptodate) {
--->        printk(DEVICE_NAME " I/O error: dev %s, sector %lu\n",   <---
                  kdevname(req->rq_dev), req->sector);
       }

       do {
           if ((bh = req->bh) != NULL) {
               req->bh = bh->b_reqnext;
               req->nr_sectors -= bh->b_size >> 9;
               req->sector += bh->b_size >> 9;
               bh->b_reqnext = NULL;
               sectors -= bh->b_size >> 9;
               bh->b_end_io(bh, uptodate);
               if ((bh = req->bh) != NULL) {
                   req->current_nr_sectors = bh->b_size >> 9;
                   if (req->nr_sectors < req->current_nr_sectors) {
                       req->nr_sectors = req->current_nr_sectors;
                       printk("end_scsi_request: buffer-list destroyed\n");
                   }
               }
           }
       } while(sectors && bh);

If "!uptodate" is an error worthy of a printk, shouldn't some sort of error
be returned back?


One strange thing (to me) is the pattern to the "scsidisk I/O error"
messages is
that they are only reported for minor numbers ending with "1" and ALL of
those
map to devices (/dev/sdXY) that I do not use:

  Apr 30 07:43:26 linux1 kernel: SCSI host 2 abort (pid 707672569) timed
out - resetting
  Apr 30 07:43:26 linux1 kernel: SCSI bus is being reset for host 2 channel
0.
  Apr 30 07:44:28 linux1 kernel: SCSI disk error : host 2 channel 0 id 6
lun 6 return code = 26030000
  Apr 30 07:44:28 linux1 kernel: scsidisk I/O error: dev 08:71, sector
23801000
  Apr 30 07:44:28 linux1 kernel: SCSI disk error : host 2 channel 0 id 6
lun 5 return code = 26030000
  Apr 30 07:44:28 linux1 kernel: scsidisk I/O error: dev 08:61, sector
24565544
  Apr 30 07:44:28 linux1 kernel: SCSI disk error : host 2 channel 0 id 6
lun 4 return code = 26030000
  Apr 30 07:44:28 linux1 kernel: scsidisk I/O error: dev 08:51,
  Apr 30 07:44:28 linux1 kernel: scsidisk I/O error: dev 08:11, sector
23749072
  Apr 30 07:44:28 linux1 kernel: SCSI disk error : host 2 channel 0 id 6
lun 2 return code = 26030000
  Apr 30 07:44:28 linux1 kernel: scsidisk I/O error: dev 08:31, sector
31594584
  Apr 30 07:44:28 linux1 kernel: scsi2 channel 0 : resetting for second
half of retries.
  Apr 30 07:44:28 linux1 kernel: SCSI bus is being reset for host 2 channel
0.
...snip...
This continues on (and on) for dev  08:11, 08:21, 08:31, 08:41, 08:51,
08:61, and 08:71

I have one partition per LUN, so I use dev: 08:1, 08:17, 08:33,...,
08:n+16, 65:1, 65:17, ...

So the "scsidisk I/O error" could be an artifact of some recovery process
and not
the root of the problem.  Still, if there is no devices associated with the
major:minor reported in the msg, where did it get the sector number and
what
was it trying to do with it?

Any guidance on how to pin this down would be greatly appreciated.

The other question is what is going on w/ host2, channel0, id6... I'll put
a SCSI bus analyzer on that and see if I can find the cause of the 2603's
(I assume that is the ASC/ASCQ values, correct???) although they appear on
the successfully recovered error injects as well as the bad ones.

Thanks,
Steve



-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]

Reply via email to