[hercules is a s370/s390/z900 emulator, see
http://www.conmicro.cx/hercules for details].

When running under hercules (s390 mode), attempting to run
dasdfmt in a recent redhat 7.2 image (Linux version 2.4.9-21BOOT
([EMAIL PROTECTED]) (gcc version 2.95.3 20010315
(release)) #1 SMP Fri Jan 18 16:50:02 CET 2002) eventually
results in message `kernel BUG at dasd.c:1758'.

This message is issued because the cqr has an invalid status
(CQR_STATUS_EMPTY) when the bottom half routine `dasd_run_bh'
calls `dasd_process_queues'.

I have done a rather tedious instruction trace analysis to
try and pinpoint the problem.  Basically, `dasd_format' (for
ckd devices) does the following (edited for brevity);

   while ( ... ) {
     req = dasd_eckd_format_device (...);
     rc = dasd_sleep_on_req (req);
     dasd_free_request (...);
   }

`dasd_eckd_format_device' builds (fills) the cqr with the
channel program (to format a track), etc; `dasd_sleep_on_req'
causes the i/o to get asynchronously scheduled and waits until
the i/o request is complete (done), and `dasd_free_request'
performs some cleanup.

Two bottom half routines are dispatched for an i/o request,
the first schedules the i/o, and the second processes the
completed i/o (and wakes up `dasd_sleep_on_req').

dasd_sleep_on_req performs the following (again edited for brevity)

   s390irq_spin_lock_irqsave (...);
   dasd_chanq_enq (...);
   dasd_schedule_bh (...);
   do {
     s390irq_spin_unlock_irqrestore (...);
     wait_event ( device->wait_q,
                  (((cs = req->status) == CQR_STATUS_DONE) ...);
     s390irq_spin_lock_irqsave (...);
   } while (cs != CQR_STATUS_DONE ...);

`s390irq_spin_lock_irqsave', among other things, disables the psw;
likewise, `s390irq_spin_unlock_irqrestore' renables the psw for
interrupts.  There is a small window of opportunity between the
unlock and the wait_event that the i/o actually completes and
dasd_sleep_on_req never actually waits, making the second bottom
half routine superfluous.

For example, immediately after the `SSM' instruction in the unlock
is executed an external interrupt could occur.  The first bottom
half routine is then dispatched scheduling the i/o (dasd_run_bh ->
dasd_start_IO -> do_IO -> SSCH).  before the instruction after the SSM
instruction can be executed, the i/o interrupt could occur.
`dasd_int_handler' sets the cqr status to DONE and schedules the
second bottom half routine.  Now `dasd_sleep_on_req' doesn't
wait at all because `wait_event' doesn't wait if the specified
condition is true.  Control returns back to `dasd_format',
which calls `dasd_free_request', then `dasd_eckd_format_device' which
zeroes the cqr (effectively setting the status to EMPTY).  If an
external interrupt now occurs then the second bottom half routine
runs and the bug occurs because it doesn't expect an EMPTY status.

Perhaps dasd_sleep_on_req should always wait (__wait_event instead)
or the second bottom half routine could recognize that
`dasd_sleep_on_req' is not waiting.

Below I have included a more detailed trace of the sequence of events
(but no where near the volume of the 100000 line instruction trace
I was working with ;-)  The digits represent dispatchable work units.

One phenomenon of hercules is that i/o can appear to complete much
sooner than on a real machine.  It is possible for the i/o interrupt
to be pending as soon as ssch completes.  We have kludged around the
problem by coding a wait before marking the i/o interrupt pending.

Thanks,
Greg Smith


0     dasd_eckd_format_device
0       dasd_alloc_request
0         ccw_alloc_request
0           CQR_STATUS_EMPTY
0         return /*ccw_alloc_request*/
0       return /*dasd_alloc_request*/
0       CQR_STATUS_FILLED
0     return /*dasd_eckd_format_device*/

0     dasd_sleep_on_req
0       CQR_STATUS_QUEUED [dasd_chanq_enq]
0       dasd_schedule_bh
0       return /*dasd_schedule_bh*/
0       SSM [s390irq_spin_unlock_irqrestore]

1     *** ext interrupt  ***
1     dasd_run_bh
1       dasd_process_queues
1         dasd_start_IO
1           SSCH
1           CQR_STATUS_IN_IO
1         return /*dasd_start_IO
1       return /*dasd_process_queues*/

2     *** i/o interrupt  ***
2     TSCH
2     dasd_int_handler
2       CQR_STATUS_DONE
2       dasd_schedule_bh
2       return /*dasd_schedule_bh*/
2     return /*dasd_int_handler*/
2     TPI
2     *** interrupt exit ***

1     return /*dasd_run_bh*/
1     *** interrupt exit ***

0     return /*dasd_sleep_on_req*/

0     dasd_free_request
0     return /*dasd_free_request*/

0     dasd_eckd_format_device
0       dasd_alloc_request
0         ccw_alloc_request
0           CQR_STATUS_EMPTY

3     *** ext interrupt  ***
3     dasd_run_bh
3       dasd_process_queues
3         BUG()

Reply via email to