Mark Hahn wrote:
> > First results from testing my patch are in. If you were planning on
> > trying it buy haven't yet, please try this....
>
> you've removed the spinlock. do you know why it was there
> in the first place?
Spinlocks protect "critical regions". Usually, however, when the
operations are correctly ordered, there is no critical region.
For example, when I'm adding items to a linked list,
tail->next = curitem;
curitem->next = NULL;
requires a lock against "people" reading this list. Otherwise they
would possibly read past the end of the list, because my
"curitem->next" pointer might be pointing to the middle of nowhere.
If I turn these around, there is no problem with others "reading" this
list.
curitem->next = NULL;
tail->next = curitem;
Now, if someone were to interupt and read the list anywhere (before in
the middle, or after), the list is in a valid state.
In the case of the APIC error interrupt, I THINK that the apic is
forbidden from interrupting us at LEAST until I've done "ack_apic".
After that, the code is no longer a critical section. It's fine if it
gets interrupted by itself (the only "other" user of that specific
spinlock!)
Hmm. HMMMMMMM.
Scenario:
processor 1: processor 2:
smp_error_interrupt() Happy happy!
take spinlock
do some things.
printk something Still happy!
-> Another error happens
smp_error_interrupt()
try to take spinlock <--------------
Happy.
smp_error_interrupt ()
try to take spinlock
[ full deadlock! ]
<---- = This hangs CPU 1!!!!!
Normally the "critical" section would be protected against the OTHER
CPU, however, now we simply hangup the CPU that happens to get a burst
of apic_errors.
Now, once the first CPU is wedged, the other CPU will shortly also get
an APIC error, and hang in the same spinlock. Deadlock!
As the APIC has to do with interrupts, it may not be possible to get a
keyboard interrupt to one of the processors, to be able to get a
register dump....
Anyway, I'd rather have the routine reenter itself, and shit all over
itself (i.e. give garbled output) than that it hangs itself on
something stupid like a reentrance of that apic error interrupt.
Worst that can happen is that the stack is exausted by a flurry of
these errors, but that's better than hanging a CPU in a spinlock.
You'd get an Ooops about the stack overflow......
Roger.
--
** [EMAIL PROTECTED] ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* Common sense is the collection of *
****** prejudices acquired by age eighteen. -- Albert Einstein ********
--
=- To unsubscribe, email [EMAIL PROTECTED] with the -=
=- body of "unsubscribe linux-abit". -=