Jim Houston wrote:
> 
> I have been in hack mode, and I probably have some self inflicted
> problems.  Your analysis seems correct, but I still had problems
> with the combination of  your patch + the version kdba_bp.c that
> I sent out on Friday.  I did not mean to impugn your patch and
> appologize if I have.
> 
        I wasn't upset that you were impugning the integrity of my patch 8), I
was just concerned that I'd rush out something that didn't work, and was
anxious to find the cause. After your email I did find a dumb bug. At
line 1374 of kdbmain.c (with my changes) you need to add:

                kdb_initial_cpu = -1;
                KDB_STATE_CLEAR(RECURSE);

        In the early exit case: "hit a breakpoint but can't find a match", this
code was previously before kdb_initial_cpu and I didn't change things to
ensure that kdb_initial_cpu is cleared again before exiting. At this
point, my 2p system is stable, but my 8p is still seeing some issues,
which I'm still looking into.

> The initial enthusiasm wore off once I started putting breakpoints
> at places like do_schedule or sys_open.  More often than not, it hung.
> I also ran into the panic processing breakpoints that have been
> removed.  They are described in the comment before kdba_db_trap().
> It still hung even doing bd instead of bc.
> 
        My fix above may help with this, although it's not the only issue.
Setting a breakpoint at cpu_idle+0x20 (which is inside the main loop,
ie. an idle system will hit this constantly) I have no problems on 2p.
On 8p I have no problems with "go" after the breakpoint, but after I
delete the breakpoint and "go", I still have some unreliability with
re-entering kdb.

> I went on to experiment with splitting the kdb_state into separate
> variables for the per-cpu-private vs inter-cpu synchronization.
> I was hoping that I could simplify the problem by eliminating the
> interactions between most of the flags.  I was worried
> about interactions between processors leaving kdb and new
> arrivals.
> 
        It is true that a lot of these flags may not need to be this way. I
took REENTRY and just turned it into a local variable within kdb(),
since the value isn't carried over between calls into kdb() and isn't
used outside of kdb(). There may be other such "opportunities". But as
to interactions, no one should set anyone else's state until all CPUs
are under kdb control in kdb_main_loop(), and no one should set one's
own state until you either own kdb_initial_cpu or you are in kdb() due
to an IPI. If this isn't the case somewhere, then I missed it.

> Regards NMI racing with normal breakpoints -  I want to
> solve a larger problem. If I can avoid the extra layer of nesting,
> I will solve the deleted breakpoint problem.  It seems ugly to
> switch to the other cpu, do a stack trace and see part of kdb
> rather than what that cpu was doing.  I would also like to switch
> to the other cpu and then single step.  I also worry about what
> happens if the NMI interupts the spinlock which protects
> kdb_initial_cpu.
> 
        My only concern is that this hides the truth of the situation -- the
truth being that, when this CPU entered kdb, the other CPU was really in
kdb too. But there are benefits to this, too. Perhaps the best idea is
to try to avoid setting breakpoints where you are *likely* to hit it on
a second processor before the first has managed to send the IPIs.
        -- Ethan

Reply via email to