On Wed, Aug 13, 2003 at 01:41:14PM -0400, Nick Fisher wrote:
> > As you surmise, eip is the interesting register. Sort the
> > contents of /proc/ksyms and see where it falls. In my /proc/ksyms (which
> > won't match yours!), I see these entries bracketing that value:
> >
> > c02f04b0 task_read_24_Rsmp_ae3fb3f3
> > c02f3870 proc_ide_read_geometry_Rsmp_50fed6f7
> >
> > so if I saw that address in my dump, I'd know the failure was in a routine
> > named task_read_24_Rsmp(). (The trailing stuff is module versioning.)
> Right..... well from my sorted ksyms...
>
> c02ea460 scsi_malloc_R1cce3f92
> c02ea598 scsi_free_R475dddfa
> c0306a4c register_cdrom_R5a61744f
> c0306d20 unregister_cdrom_R703d3575
>
> So I'm guessing that the problem is in scsi_free() yes? That would explain
> why I keep having the problem with all my kernels. All my kernels have the
> aic7xxx driver for my card.....
>
> How can I tell where scsi_free() comes from? I'm guessing that it's from
> the aic7xxx driver but how can I tell?
When you find a suspect routine, grep the kernel sources for it.
> > I believe a stack traceback also appears in the NMI Watchdog output -
> > it's sometimes interesting to construct a traceback by gathering some
> > of those addresses.
> I put everything I found on the console in the mail..... so I'm not sure
> about the stack trace.....
>
> > The last time I used this technique, BTW, I identified some buggy SCSI
> > module code.
> Hummmmm.... sounds familiar....
I don't remember if I bombed in scsi_free() - the bug I found was
elsewhere, but the actual meltdown might have happened in a call to
scsi_free(). Bombing in memory allocation code reflects mistakes made
elsewhere - which is why it's often useful to build several layers
of traceback.
>
> > After some grueling detective work, I found a message
> > somewhere that said, "oops, I forgot to propagate my Adaptec fix from
> > this aic79xx module to this aic7xxx module"... found the patch, applied
> > it to my Gentoo sources, and was back in business.
> I don't suppose that patch is still missing from the gentoo sources is it?
> I'm gussing not.... that would be *to* easy.
> Without getting you to do my work for me, where should I go looking for
> things relating to this? What should I look for?
Nope - gentoo-sources is still at the release that I patched. I thought
Gentoo would rather track patches released from kernel.org rather than
get them ad hoc from users, but maybe I was just being lazy.
So I attach them here - for you, and for the Gentoo maintainers if they're
interested. Not sure if it'll fix your problem, but it's worth a try.
=======================================================================
--- aic7xxx_osm.h 2003-08-13 14:32:16.000000000 -0400
+++ /usr/src/linux/drivers/scsi/aic7xxx/aic7xxx_osm.h 2003-06-30 14:06:15.000000000
-0400
@@ -737,7 +737,9 @@
* trade the io_request_lock for our per-softc lock.
*/
#if AHC_SCSI_HAS_HOST_LOCK == 0
- ahc_lock(ahc, flags);
+ /* ahc_lock(ahc, flags); */
+ spin_unlock(&io_request_lock);
+ spin_lock(&ahc->platform_data->spin_lock);
#endif
}
@@ -745,7 +747,9 @@
ahc_midlayer_entrypoint_unlock(struct ahc_softc *ahc, unsigned long *flags)
{
#if AHC_SCSI_HAS_HOST_LOCK == 0
- ahc_unlock(ahc, flags);
+ /* ahc_unlock(ahc, flags); */
+ spin_unlock(&ahc->platform_data->spin_lock);
+ spin_lock(&io_request_lock);
#endif
}
=======================================================================
Nathan Meyers
[EMAIL PROTECTED]
--
[EMAIL PROTECTED] mailing list