On Wed, Aug 13, 2003 at 01:41:14PM -0400, Nick Fisher wrote:
> > As you surmise, eip is the interesting register. Sort the
> > contents of /proc/ksyms and see where it falls. In my /proc/ksyms (which
> > won't match yours!), I see these entries bracketing that value:
> >
> >   c02f04b0 task_read_24_Rsmp_ae3fb3f3
> >   c02f3870 proc_ide_read_geometry_Rsmp_50fed6f7
> >
> > so if I saw that address in my dump, I'd know the failure was in a routine
> > named task_read_24_Rsmp(). (The trailing stuff is module versioning.)
> Right..... well from my sorted ksyms...
> 
> c02ea460 scsi_malloc_R1cce3f92
> c02ea598 scsi_free_R475dddfa
> c0306a4c register_cdrom_R5a61744f
> c0306d20 unregister_cdrom_R703d3575
> 
> So I'm guessing that the problem is in scsi_free() yes? That would explain
> why I keep having the problem with all my kernels. All my kernels have the
> aic7xxx driver for my card.....
> 
> How can I tell where scsi_free() comes from? I'm guessing that it's from
> the aic7xxx driver but how can I tell?

When you find a suspect routine, grep the kernel sources for it.

> > I believe a stack traceback also appears in the NMI Watchdog output -
> > it's sometimes interesting to construct a traceback by gathering some
> > of those addresses.
> I put everything I found on the console in the mail..... so I'm not sure
> about the stack trace.....
> 
> > The last time I used this technique, BTW, I identified some buggy SCSI
> > module code.
> Hummmmm.... sounds familiar....

I don't remember if I bombed in scsi_free() - the bug I found was
elsewhere, but the actual meltdown might have happened in a call to
scsi_free(). Bombing in memory allocation code reflects mistakes made
elsewhere - which is why it's often useful to build several layers
of traceback.

> 
> > After some grueling detective work, I found a message
> > somewhere that said, "oops, I forgot to propagate my Adaptec fix from
> > this aic79xx module to this aic7xxx module"... found the patch, applied
> > it to my Gentoo sources, and was back in business.
> I don't suppose that patch is still missing from the gentoo sources is it?
> I'm gussing not.... that would be *to* easy.
> Without getting you to do my work for me, where should I go looking for
> things relating to this? What should I look for?

Nope - gentoo-sources is still at the release that I patched.  I thought
Gentoo would rather track patches released from kernel.org rather than
get them ad hoc from users, but maybe I was just being lazy.

So I attach them here - for you, and for the Gentoo maintainers if they're
interested. Not sure if it'll fix your problem, but it's worth a try.

=======================================================================
--- aic7xxx_osm.h       2003-08-13 14:32:16.000000000 -0400
+++ /usr/src/linux/drivers/scsi/aic7xxx/aic7xxx_osm.h   2003-06-30 14:06:15.000000000 
-0400
@@ -737,7 +737,9 @@
         * trade the io_request_lock for our per-softc lock.
         */
 #if AHC_SCSI_HAS_HOST_LOCK == 0
-       ahc_lock(ahc, flags);
+       /* ahc_lock(ahc, flags); */
+       spin_unlock(&io_request_lock);
+       spin_lock(&ahc->platform_data->spin_lock);
 #endif
 }
 
@@ -745,7 +747,9 @@
 ahc_midlayer_entrypoint_unlock(struct ahc_softc *ahc, unsigned long *flags)
 {
 #if AHC_SCSI_HAS_HOST_LOCK == 0
-       ahc_unlock(ahc, flags);
+       /* ahc_unlock(ahc, flags); */
+       spin_unlock(&ahc->platform_data->spin_lock);
+       spin_lock(&io_request_lock);
 #endif
 }
 
=======================================================================

Nathan Meyers
[EMAIL PROTECTED]

--
[EMAIL PROTECTED] mailing list

Reply via email to