Please find a patch in "crash_target: Support for GDB debugging of all tasks" mail thread. Use the latest version 3 of the patch. Thanks, --Alexey
On Wed, Mar 13, 2024 at 10:41 PM Aditya Gupta <adit...@linux.ibm.com> wrote: > Hi Alexey, > > On Wed, Mar 13, 2024 at 12:42:22PM -0700, Alexey Makhalov wrote: > > Hi folks, > > > > I'm working on an improvement in crash_target for gdb to backtrace and > > debug (including local frame variables) all tasks, not just active > > tasks/CPUs > > I will send the patch shortly. > > Thanks for the interest in this. Just in case, with both patch series > combined, it should be possible to backtrace and run info locals, etc, > on arbitrary tasks also. Interested in your patch though :) > > Thanks, > Aditya Gupta > > > > > Thanks, > > --Alexey > > > > On Wed, Mar 13, 2024 at 2:34 AM Aditya Gupta <adit...@linux.ibm.com> > wrote: > > > > > Hi Tao, > > > On Wed, Mar 13, 2024 at 11:23:56AM +0800, Tao Liu wrote: > > > > Hi Aditya, > > > > > > > > On Tue, Mar 12, 2024 at 5:12 PM Aditya Gupta <adit...@linux.ibm.com> > > > wrote: > > > > > > > > > > Hi Tao, > > > > > > > > > > > <...> > > > > > > > > > > > > +crash_target *target = NULL; > > > > > > + > > > > > > void > > > > > > crash_target_init (void) > > > > > > { > > > > > > int nr_cpus = crash_get_nr_cpus(); > > > > > > - crash_target *target = new crash_target (); > > > > > > + target = new crash_target (); > > > > > > > > > > > > /* Own the target until it is successfully pushed. */ > > > > > > target_ops_up target_holder (target); > > > > > > @@ -137,29 +139,35 @@ crash_target_init (void) > > > > > > reinit_frame_cache (); > > > > > > } > > > > > > > > > > > > -/* > > > > > > - * Change gdb's thread context to the thread on given CPU > > > > > > - **/ > > > > > > extern "C" int > > > > > > -gdb_change_cpu_context(unsigned int cpu) > > > > > > +gdb_change_thread_context (ulong task) > > > > > > { > > > > > > - ptid_t ptid = ptid_t(CRASH_INFERIOR_PID, 0, cpu); > > > > > > - inferior *inf = current_inferior (); > > > > > > - thread_info *tp = find_thread_ptid (inf, ptid); > > > > > > - > > > > > > - if (tp == nullptr) > > > > > > + int tried = 0; > > > > > > + inferior* inf = current_inferior(); > > > > > > + int cpu = crash_set_thread(task); > > > > > > + if (cpu < 0) > > > > > > return FALSE; > > > > > > > > > > > > - /* Making sure that crash's context is same */ > > > > > > - set_cpu(cpu, FALSE); > > > > > > - > > > > > > - /* Switch to the thread */ > > > > > > - switch_to_thread(tp); > > > > > > - > > > > > > - /* Fetch/Refresh thread's registers */ > > > > > > - gdb_refresh_regcache(cpu); > > > > > > + ptid_t ptid = ptid_t(CRASH_INFERIOR_PID, 0, cpu); > > > > > > > > > > > > - return TRUE; > > > > > > +retry: > > > > > > + thread_info *tp = find_thread_ptid (inf, ptid); > > > > > > + if (tp == nullptr && !tried) { > > > > > > + thread_info *thread = add_thread_silent(target, > > > > > > + ptid_t(CRASH_INFERIOR_PID, 0, > cpu)); > > > > > > > > > > A minor comment. > > > > > Is it possible to do it without needing 'target' as global ? I see > > > > > there's 'inf->current_top_target()', but it's return type if > > > target_ops, > > > > > don't know if it can be used. > > > > > > > > > Thanks for your comments, however I guess it is not workable for the > > > > code change. What we want is to add a thread to the target. I found > > > > one function "target.c:new_thread", which can be used as > > > > "new_thread(current_inferior(), ptid_t(CRASH_INFERIOR_PID, 0, cpu));" > > > > to replace the "add_thread_silent()" code, except we need to make > > > > "new_thread" to be non-static. > > > > > > > > I prefer to keep the code as current, because I don't see the side > > > > effect of making target as global variable. What do you think? > > > > > > Then there doesn't seem to be an alternative, then I guess we will have > > > to keep the global variable, seems okay to me. > > > > > > > > > > > > > <...> > > > > > > > > > > > > /* > > > > > > * Collect the irq_desc[] entry along with its associated > handler > > > and > > > > > > diff --git a/task.c b/task.c > > > > > > index a405b05..ef79f53 100644 > > > > > > --- a/task.c > > > > > > +++ b/task.c > > > > > > @@ -715,7 +715,8 @@ task_init(void) > > > > > > * crash_target::fetch_registers, so CPU 0's registers are > > > shown as > > > > > > * <unavailable> in gdb mode > > > > > > * */ > > > > > > - gdb_refresh_regcache(0); > > > > > > + for (int i = 0; i < get_cpus_online(); i++) > > > > > > + gdb_refresh_regcache(i); > > > > > > > > > > Does x86 require all regcache(s) to be refreshed at init time ? > > > > > > > > > > Till v4 of my ppc series, I also was doing refresh of all CPUs, but > > > it's > > > > > better to do it when required, and there was a review in v4, patch > 5 > > > > > from Lianbo, quoting it here: > > > > > > > > > > > In addition, I haven't tested how far it takes the time in the > > > > > > multi-core environment, is it possible to implement a smaller > > > > > > granularity refresh operation? For example: When we need switch > to > > > > > > another cpu, execute the refresh operation for corresponding > > > regcache. > > > > > > > > > > CPU 0's refresh is required due to gdb having already tried to > > > > > initialised registers before we initialised crash_target > > > > > > > > > > This might add a small noticeable delay on vmcores from big > systems. > > > > > > > > > > Adds around 4.6 seconds for a ppc64 vmcore with 64 CPUs, in crash > init > > > > > time, while reading it on a fast 48 cpu Power8 system. > > > > > This overhead is less for systems with less CPUs, eg. ~0.43 > seconds, on > > > > > a vmcore of 8 CPUs x86_64 system. > > > > > > > > Yes, thanks for the testing results. There do have performance > > > > downgrades of refreshing all cpu caches. But this will give correct > > > > results when user type "info threads" after showing "crash>": > > > > > > > > With all cpu cache refreshed: > > > > crash> info threads > > > > Id Target Id Frame > > > > 1 CPU 0 native_safe_halt () at > > > > arch/x86/include/asm/irqflags.h:54 > > > > 2 CPU 1 native_safe_halt () at > > > > arch/x86/include/asm/irqflags.h:54 > > > > 3 CPU 2 <unavailable> in ?? () > > > > 4 CPU 3 <unavailable> in ?? () > > > > 5 CPU 4 native_safe_halt () at > > > > arch/x86/include/asm/irqflags.h:54 > > > > 6 CPU 5 <unavailable> in ?? () > > > > 7 CPU 6 <unavailable> in ?? () > > > > * 8 CPU 7 blk_mq_rq_timed_out (req=0xffff880fdb246000, > > > > reserved=reserved@entry=false) at block/blk-mq.c:640 > > > > 9 CPU 8 native_safe_halt () at > > > > arch/x86/include/asm/irqflags.h:54 > > > > 10 CPU 9 <unavailable> in ?? () > > > > 11 CPU 10 <unavailable> in ?? () > > > > 12 CPU 11 native_safe_halt () at > > > > arch/x86/include/asm/irqflags.h:54 > > > > 13 CPU 12 <unavailable> in ?? () > > > > 14 CPU 13 native_safe_halt () at > > > > arch/x86/include/asm/irqflags.h:54 > > > > 15 CPU 14 <unavailable> in ?? () > > > > 16 CPU 15 <unavailable> in ?? () > > > > 17 CPU 16 <unavailable> in ?? () > > > > 18 CPU 17 native_safe_halt () at > > > > arch/x86/include/asm/irqflags.h:54 > > > > 19 CPU 18 native_safe_halt () at > > > > arch/x86/include/asm/irqflags.h:54 > > > > 20 CPU 19 <unavailable> in ?? () > > > > 21 CPU 20 <unavailable> in ?? () > > > > 22 CPU 21 <unavailable> in ?? () > > > > 23 CPU 22 native_safe_halt () at > > > > arch/x86/include/asm/irqflags.h:54 > > > > 24 CPU 23 <unavailable> in ?? () > > > > crash> q > > > > > > > > With only cpu0 refreshed: > > > > crash> info threads > > > > Id Target Id Frame > > > > 1 CPU 0 native_safe_halt () at > > > > arch/x86/include/asm/irqflags.h:54 > > > > 2 CPU 1 blk_mq_rq_timed_out (req=0xffff880fdb246000, > > > > reserved=reserved@entry=false) at block/blk-mq.c:640 > > > > 3 CPU 2 blk_mq_rq_timed_out (req=0xffff880fdb246000, > > > > reserved=reserved@entry=false) at block/blk-mq.c:640 > > > > 4 CPU 3 blk_mq_rq_timed_out (req=0xffff880fdb246000, > > > > reserved=reserved@entry=false) at block/blk-mq.c:640 > > > > 5 CPU 4 blk_mq_rq_timed_out (req=0xffff880fdb246000, > > > > reserved=reserved@entry=false) at block/blk-mq.c:640 > > > > 6 CPU 5 blk_mq_rq_timed_out (req=0xffff880fdb246000, > > > > reserved=reserved@entry=false) at block/blk-mq.c:640 > > > > 7 CPU 6 blk_mq_rq_timed_out (req=0xffff880fdb246000, > > > > reserved=reserved@entry=false) at block/blk-mq.c:640 > > > > * 8 CPU 7 blk_mq_rq_timed_out (req=0xffff880fdb246000, > > > > reserved=reserved@entry=false) at block/blk-mq.c:640 > > > > 9 CPU 8 blk_mq_rq_timed_out (req=0xffff880fdb246000, > > > > reserved=reserved@entry=false) at block/blk-mq.c:640 > > > > 10 CPU 9 blk_mq_rq_timed_out (req=0xffff880fdb246000, > > > > reserved=reserved@entry=false) at block/blk-mq.c:640 > > > > 11 CPU 10 blk_mq_rq_timed_out (req=0xffff880fdb246000, > > > > reserved=reserved@entry=false) at block/blk-mq.c:640 > > > > 12 CPU 11 blk_mq_rq_timed_out (req=0xffff880fdb246000, > > > > reserved=reserved@entry=false) at block/blk-mq.c:640 > > > > 13 CPU 12 blk_mq_rq_timed_out (req=0xffff880fdb246000, > > > > reserved=reserved@entry=false) at block/blk-mq.c:640 > > > > 14 CPU 13 blk_mq_rq_timed_out (req=0xffff880fdb246000, > > > > reserved=reserved@entry=false) at block/blk-mq.c:640 > > > > 15 CPU 14 blk_mq_rq_timed_out (req=0xffff880fdb246000, > > > > reserved=reserved@entry=false) at block/blk-mq.c:640 > > > > 16 CPU 15 blk_mq_rq_timed_out (req=0xffff880fdb246000, > > > > reserved=reserved@entry=false) at block/blk-mq.c:640 > > > > 17 CPU 16 blk_mq_rq_timed_out (req=0xffff880fdb246000, > > > > reserved=reserved@entry=false) at block/blk-mq.c:640 > > > > 18 CPU 17 blk_mq_rq_timed_out (req=0xffff880fdb246000, > > > > reserved=reserved@entry=false) at block/blk-mq.c:640 > > > > 19 CPU 18 blk_mq_rq_timed_out (req=0xffff880fdb246000, > > > > reserved=reserved@entry=false) at block/blk-mq.c:640 > > > > 20 CPU 19 blk_mq_rq_timed_out (req=0xffff880fdb246000, > > > > reserved=reserved@entry=false) at block/blk-mq.c:640 > > > > 21 CPU 20 blk_mq_rq_timed_out (req=0xffff880fdb246000, > > > > reserved=reserved@entry=false) at block/blk-mq.c:640 > > > > 22 CPU 21 blk_mq_rq_timed_out (req=0xffff880fdb246000, > > > > reserved=reserved@entry=false) at block/blk-mq.c:640 > > > > 23 CPU 22 blk_mq_rq_timed_out (req=0xffff880fdb246000, > > > > reserved=reserved@entry=false) at block/blk-mq.c:640 > > > > 24 CPU 23 blk_mq_rq_timed_out (req=0xffff880fdb246000, > > > > reserved=reserved@entry=false) at block/blk-mq.c:640 > > > > > > > > As we can see, other cpus[1-6, 8-23] just take the reg cache of > > > > cpu[7], which is incorrect. And if users go further like, "thread 20" > > > > and "gdb bt", it will also give incorrect stack traces. > > > > > > > > The cpu cache will only get refreshed once user type "set <pid>", so > > > > the cpu cache will be refreshed by the <pid> task's context. > > > > > > > > I doubt a user will understand all the details and constraints, so > I'm > > > > afraid the user will be confused by the faulty output. But I also > have > > > > no objection if the performance is the priority. Basically it is a > > > > balance of pays and gains. In addition, since cmd "info" and "thread" > > > > is a command provided by gdb, currently I don't know how to hack > > > > those, so cpu cache can be refreshed when "info threads" or "thread > > > > <num>" have been invoked. > > > > > > > > Do you have any thoughts? > > > > > > I also had faced that issue initially, ie. the other CPUs using up same > > > regcache, if all are not refreshed. > > > While iterating through all threads, gdb switches it's context > > > temporarily, while crash's context remained same, thus causing gdb to > > > get same registers for all threads other than 0. > > > > > > This was solved in patch #3 (synchronise cpu context changes between > > > crash/gdb) > > > in the ppc's 'Improve stack unwind on ppc64' series, by syncing gdb's > > > context with crash. > > > > > > Can this change in thread.c in gdb-10.2.patch in patch #2 be reverted ? > > > That will fix it. > > > > > > ``` > > > ---- gdb-10.2/gdb/thread.c.orig > > > -+++ gdb-10.2/gdb/thread.c > > > -@@ -60,7 +60,7 @@ static thread_info *current_thread_; > > > - > > > - #ifdef CRASH_MERGE > > > - /* Function to set cpu, defined by crash-utility */ > > > --extern "C" void set_cpu (int); > > > -+extern "C" void set_cpu(int cpu, int print_context); > > > - #endif > > > - > > > - /* RAII type used to increase / decrease the refcount of each thread > > > -@@ -1098,6 +1098,9 @@ print_thread_info_1 (struct ui_out *uiout, const > > > char *requested_threads, > > > - uiout->table_body (); > > > - } > > > - > > > -+#ifdef CRASH_MERGE > > > -+ int current_cpu = current_thread ? current_thread->ptid.tid() > : 0; > > > -+#endif > > > - for (inferior *inf : all_inferiors ()) > > > - for (thread_info *tp : inf->threads ()) > > > - { > > > -@@ -1113,6 +1116,11 @@ print_thread_info_1 (struct ui_out *uiout, > const > > > char *requested_threads, > > > - > > > - ui_out_emit_tuple tuple_emitter (uiout, NULL); > > > - > > > -+ /* Switch to the thread's CPU in crash */ > > > -+#ifdef CRASH_MERGE > > > -+ set_cpu(tp->ptid.tid(), FALSE); > > > -+#endif > > > -+ > > > ``` > > > > > > 'info threads' keeps changing the gdb context temporarily, while it > > > prints each thread's frame and other information. > > > Correspondingly this changes ensured that crash's context also syncs > > > according to change in gdb's context. > > > > > > Then, when user does 'info threads', it will refresh all regcache since > > > it has to go through each thread. > > > > > > In that case that issue of the same frame being printed for all threads > > > should not happen. > > > > > > That should remove the need to refresh all regcache at init time, and > > > leave it to later when user needs it. Refreshing all regcache at init > > > time is the sure shot way that all regcache are correct, but I guess > > > everyone will anyways do a 'info threads', but we can delay the > overhead > > > to later. What do you think ? > > > > > > Thanks, > > > Aditya Gupta > > > > > > > > > > > Thanks, > > > > Tao Liu > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > Aditya Gupta > > > > > > > > > > > > > > > > > tt->flags |= TASK_INIT_DONE; > > > > > > } > > > > > > @@ -5315,7 +5316,7 @@ set_context(ulong task, ulong pid, uint > > > update_gdb_thread) > > > > > > > > > > > > /* change the selected thread in gdb, according to > > > current context */ > > > > > > if (update_gdb_thread) > > > > > > - return > gdb_change_cpu_context(tc->processor); > > > > > > + return gdb_change_thread_context(tc->task); > > > > > > else > > > > > > return TRUE; > > > > > > } else { > > > > > > -- > > > > > > 2.40.1 > > > > > > > > > > > > > > > > > > -- > > > Crash-utility mailing list -- devel@lists.crash-utility.osci.io > > > To unsubscribe send an email to > devel-le...@lists.crash-utility.osci.io > > > https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/ > > > Contribution Guidelines: https://github.com/crash-utility/crash/wiki > > > >
-- Crash-utility mailing list -- devel@lists.crash-utility.osci.io To unsubscribe send an email to devel-le...@lists.crash-utility.osci.io https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/ Contribution Guidelines: https://github.com/crash-utility/crash/wiki