Ethan Solomita wrote:
> 
> Jim Houston wrote:
> >
> > The real problem is the non-maskable part of the non-
> > maskable IPI.  When there are multiple processors hitting
> > breakpoints at the same time, you never know how much of
> > the initial entry code the slave processor got to execute
> > before it was hit by the NMI.
> >
>         This is what I explicitly considered fixed by these changes. In kdb(),
> line 1342 is the beginning of the code where kdb_initial_cpu is grabbed.
> After this block, you either are the kdb_initial_cpu, or you entered kdb
> because of the IPI. So the future-slave-proecssor could not have gotten
> past this if () clause before it was hit by the NMI.
> 
>         Looking back before this, there are very few lines of code that examine
> global state, and none that modify global state. The few references to
> KDB_STATE before line 1342 can, I believe all be justified. Either the
> code knows that it is kdb_initial_cpu, or it is DOING_SS in which case
> we cannot have received an IPI from KDB, or it is HOLD_CPU. HOlD_CPU is
> used to generate "reentry", and I'm not sure why, but it seems harmless.
> 
>          Can you suggest a code path through kdb() which could lead to harm for
> a CPU which hits a breakpoint, fails to win the race for
> kdb_initial_cpu, and gets an IPI?
> 
> > I have a couple of ideas in the works.  First, I wonder about
> > having the kdb_ipi() check if it has interrupted a
> > breakpoint entry.  If it has, it could just set a flag and
> > return.  I might do this with a stack trace back or by
> > setting a flag early in the breakpoint handling (e.g. entry.S).
> 
>         I don't see how this helps -- whoever won the race for kdb_initial_cpu
> is expecting all the CPUs to gather up and enter kdb. I would expect
> that everyone who hits a breakpoint should enter kdb.
> 
> > Ethan, I'm curious if you're using an NMI on the Sparc.
> >
>         Sparc doesn't have an NMI, but the interrupt I use (an IPI) is rarely
> blocked in the kernel. Certainly not blocked by local_irq_save() and
> family.
>         -- Ethan

Hi Ethan,

I have been in hack mode, and I probably have some self inflicted
problems.  Your analysis seems correct, but I still had problems
with the combination of  your patch + the version kdba_bp.c that
I sent out on Friday.  I did not mean to impugn your patch and 
appologize if I have.

The initial enthusiasm wore off once I started putting breakpoints
at places like do_schedule or sys_open.  More often than not, it hung.
I also ran into the panic processing breakpoints that have been
removed.  They are described in the comment before kdba_db_trap().
It still hung even doing bd instead of bc.

I went on to experiment with splitting the kdb_state into separate
variables for the per-cpu-private vs inter-cpu synchronization.
I was hoping that I could simplify the problem by eliminating the
interactions between most of the flags.  I was worried
about interactions between processors leaving kdb and new 
arrivals.

Regards NMI racing with normal breakpoints -  I want to
solve a larger problem. If I can avoid the extra layer of nesting,
I will solve the deleted breakpoint problem.  It seems ugly to 
switch to the other cpu, do a stack trace and see part of kdb 
rather than what that cpu was doing.  I would also like to switch
to the other cpu and then single step.  I also worry about what
happens if the NMI interupts the spinlock which protects
kdb_initial_cpu.

I have some changes maybe 50% done.  I'm using a flag set in
entry.s to detect that the NMI has interrupted the breakpoint
entry. Hopefully I will have something useful in another day
or so.

Jim Houston - Concurrent Computer Corp.

Reply via email to