:Peter Avalos <[EMAIL PROTECTED]> added the comment:
:
:Got another one...not sure if it's gunna help though:
:...
:boot() called on cpu#1
:Uptime: 2d12h22m33s
:
:dumping to dev #da/0x20001, blockno 378927
:dump devstat_end_transaction: HELP!! busy_count for da1 is < 0 (-1)!
:LWKT_WAIT_IPIQ WARNING! 0 wait 1 (-3)
:SECONDARY PANIC ON CPU 0 THREAD 0xc0354e04

    Was this after your ahd logic fix?

    That secondary panic can only happen if devstat_end_transaction() is
    called too many times.  The INTRANSIT panic implies a message being
    replied to twice, or a message being corrupted prior to being replied.

    This implies that the BIO related to a transaction is being replied
    to twice.  I've never seen this before in my life so I think it must
    be specific to the AHD driver.  I'll bet the driver is trying to
    complete an I/O twice.

    Lets try to catch the problem earlier and maybe get a more meaningful
    backtrace.   Make sure you have options DDB_TRACE set (I think you
    do).  I have committed some code to lwkt_msgport.c (1.44) which tries
    to catch the MSGF_INTRANSIT flag on the originating cpu.

    --

    Another data point... you got the ahd_run_qoutfifo() panic again
    as well, which implies that ahd_run_qoutfifo() was running when the
    original panic occured.  I think there's an issue in AHDs transaction
    processing.  The driver does a LOT of work on a scb before it frees
    it by calling ahc_free_scb().  Somewhere in there a recursion is
    happening.

    It may be beneficial to add a 'processing in progress' flag to the scb
    in ahc_done() and assert that the flag is not set to catch double-calls
    to ahd_done() on the same scb earlier.  Something like what I have
    below may help.

                                        -Matt
                                        Matthew Dillon 
                                        <[EMAIL PROTECTED]>

Index: aic79xx.h
===================================================================
RCS file: /cvs/src/sys/dev/disk/aic7xxx/aic79xx.h,v
retrieving revision 1.2
diff -u -p -r1.2 aic79xx.h
--- aic79xx.h   17 Jun 2003 04:28:21 -0000      1.2
+++ aic79xx.h   4 Jul 2007 19:47:50 -0000
@@ -589,12 +589,13 @@   SCB_EXPECT_PPR_BUSFREE  = 0x01000,
        SCB_PKT_SENSE           = 0x02000,
        SCB_CMDPHASE_ABORT      = 0x04000,
        SCB_ON_COL_LIST         = 0x08000,
-       SCB_SILENT              = 0x10000 /*
+       SCB_SILENT              = 0x10000,/*
                                           * Be quiet about transmission type
                                           * errors.  They are expected and we
                                           * don't want to upset the user.  This
                                           * flag is typically used during DV.
                                           */
+       SCB_RUNNINGDONE         = 0x20000
 } scb_flag;
 
 struct scb {
Index: aic79xx_osm.c
===================================================================
RCS file: /cvs/src/sys/dev/disk/aic7xxx/aic79xx_osm.c,v
retrieving revision 1.13
diff -u -p -r1.13 aic79xx_osm.c
--- aic79xx_osm.c       4 Jun 2007 17:21:55 -0000       1.13
+++ aic79xx_osm.c       4 Jul 2007 19:49:48 -0000
@@ -196,6 +196,9 @@ ahd_done(struct ahd_softc *ahd, struct s
 {
        union ccb *ccb;
 
+       KKASSERT((scb->flags & SCBRUNNINGDONE) == 0);
+       scb->flags |= SCBRUNNINGDONE;
+
        CAM_DEBUG(scb->io_ctx->ccb_h.path, CAM_DEBUG_TRACE,
                  ("ahd_done - scb %d\n", SCB_GET_TAG(scb)));
 

Reply via email to