[Crash-utility] Re: [PATCH] Fix the "ps -m" command shows wrong duration of RU task

lijiang Mon, 09 Jun 2025 02:11:18 -0700

On Mon, May 26, 2025 at 5:04 PM <devel-requ...@lists.crash-utility.osci.io>
wrote:


> Date: Fri, 23 May 2025 05:23:40 +0000
> From: HAGIO KAZUHITO(萩尾 一仁) <k-hagio...@nec.com>
> Subject: [Crash-utility] Re: [PATCH] Fix the "ps -m" command shows
>         wrong duration of RU task
> To: Tao Liu <l...@redhat.com>, Ke Yin <k...@redhat.com>
> Cc: "devel@lists.crash-utility.osci.io"
>         <devel@lists.crash-utility.osci.io>
> Message-ID: <23c74e07-d439-422a-bbea-8b2bf49b3...@nec.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi,
>
> On 2025/05/15 8:43, Tao Liu wrote:
> > Hi Ke Yin,
> >
> > On Wed, May 14, 2025 at 8:58 PM Ke Yin <k...@redhat.com> wrote:
> >>
> >> Hi Tao Liu & Kazu,
> >>
> >> Thanks for replying and sharing your thoughts.
> >>
> >> After a quick review of crash tool code, I found:
> >>
> >> runq -m will call dump_on_rq_milliseconds() to print the amount
> >> of time that the active task on each cpu has been running,
> >> but only for the current running task.
> >>
> >> runq -d will call dump_on_rq_tasks() to print all tasks in the run queue
> >> and the task running on cpu without calling translate_nanoseconds().
> >>
> >> My preliminary idea is to combine these two functions and add a new
> >> parameter, for example -q, to print the tasks on each cpu that has
> >> been waiting in the run queue only. And as well as update doc of runq.
> >>
> >> In short:
> >> runq -q will call new_function which is the modified function based on
> dump_on_rq_tasks() (skip current + translate_nanoseconds).
> >>
> >> What do you think?
>
> I didn't know the "runq -d" option because it's a kind of debugging
> option and has no description in the help page.  Also it searches all
> tasks for ones that have on_rq = 1 and doesn't look very efficient
> (nr_tasks * nr_cpus).  so ideally, maybe a new function should be based
> on dump_runq() than based on dump_on_rq_tasks(), if possible..
>
>
Looks like getting a solution: adding a new option(E.g: runq -q) to achieve
this purpose? Or need more discussion? Any update?

Thanks
Lianbo


> Thanks,
> Kazu
>
>
> >
> > I'm OK with your idea in general. Please check if I understood
> > correctly, your implementation is like:
> > cmd_runq() {
> > ...
> >    if (-d option) {
> >      dump_on_rq_tasks(old path);
> >    } else if (-q option) {
> >      dump_on_rq_tasks(new path);
> >    }
> > }
> >
> > dump_on_rq_tasks(option)
> > {
> >    ...
> >    for (i = 0; i < RUNNING_TASKS(); i++, tc++) {
> >      if (old path) // Old path stay unchanged
> >        dump_task_runq_entry(tc, 0);
> >      else // New path will output your time duration
> >        your_new_function_with_translate_nanoseconds();
> >    }
> > }
> >
> > Thanks,
> > Tao Liu
> >
> >>
> >> Thanks
> >> Kenneth Yin
> >>
> >>
> >>
> >>
> >> On Mon, May 12, 2025 at 1:36 PM Tao Liu <l...@redhat.com> wrote:
> >>>
> >>> Hi Kazu & Kenneth,
> >>>
> >>> Sorry for the late reply, and thanks for your fix and comments!
> >>>
> >>> On Thu, May 8, 2025 at 12:20 PM HAGIO KAZUHITO(萩尾 一仁)
> >>> <k-hagio...@nec.com> wrote:
> >>>>
> >>>> On 2025/05/07 16:16, HAGIO KAZUHITO(萩尾 一仁) wrote:
> >>>>> Hi,
> >>>>>
> >>>>> On 2025/04/28 19:38, Kenneth Yin wrote:
> >>>>>> The RU/TASK_RUNNING stat means the task is runnable.
> >>>>>> It is either currently running or on a run queue waiting to run.
> >>>>>>
> >>>>>> Currently, the crash tool uses the "rq_clock -
> sched_info->last_arrival" formula to
> >>>>>> calculate the duration of task in RU state. This is for the
> scenario of a task running on a CPU.
> >>>>>
> >>>>> The "ps -l" and "ps -m" options display what their help text
> describes,
> >>>>> not the duration of task in RU state.  Please see "help ps".
> >>>>>
> >>>>> Also, tasks are sorted by the value, using different values for it
> could
> >>>>> make another confusion.
> >>>>>
> >>>>> The options have been used for a long time with the current code, if
> we
> >>>>> change the semantics of the options, it would be better to be
> careful.
> >>>>> The change might lose a kind of information instead of getting
> another
> >>>>> kind of information.
> >>>>>
> >>>>> On the other hand, I think that the duration of waiting in queue
> might
> >>>>> also be useful information.  I'm not sure how we should display them,
> >>>>> but for example, how about adding a new option or adding a column for
> >>>>> last_queued?
> >>>>
> >>>> I thought of that the "runq" command might be suitable to display the
> >>>> waiting duration, because only tasks in the run queues have it.  For
> >>>> example, extending the "runq -m" option or adding a new option.  just
> my
> >>>> thought.
> >>>>
> >>>> Thanks,
> >>>> Kazu
> >>>>
> >>>>>
> >>>>> What do you think, folks?
> >>>>>
> >>>>> Thanks,
> >>>>> Kazu
> >>>>>
> >>>>>>
> >>>>>> But for the scenario of a task waiting in the  CPU run queue (due
> to some reason
> >>>>>> for example cfs/rt queue throttled), this formula could cause
> misunderstanding.
> >>>>>>
> >>>>>> For example:
> >>>>>> [ 220 10:36:38.026] [RU]  PID: 12345   TASK: ffff8d674ab6b180  CPU:
> 1        COMMAND: "task"
> >>>>>>
> >>>>>> Looking closer:
> >>>>>>
> >>>>>> crash> rq.clock ffff8de438a5acc0
> >>>>>>      clock = 87029229985307234,
> >>>>>>
> >>>>>> crash> task -R sched_info,se.exec_start
> >>>>>> PID: 12345   TASK: ffff8d674ab6b180  CPU: 1  COMMAND: "task"
> >>>>>>      sched_info = {
> >>>>>>       pcount = 33,
> >>>>>>       run_delay = 0,
> >>>>>>       last_arrival = 67983031958439673,
> >>>>>>       last_queued = 87029224561119369
> >>>>>>      },
> >>>>>>      se.exec_start = 67983031958476937,
> >>>>>>
> >>>>>> 67983031         67983031                 87029224
> 87029229
> >>>>>> |<-   running on CPU  ->| <-      IN    ->|<-    waiting in queue
>   ->|
> >>>>>>
> >>>>>> For this scenario, the "task" was waiting in the run queue of the
> CPU only for 5 seconds,
> >>>>>> we should use the "rq_clock - sched_info->last_queued" formula.
> >>>
> >>> Please check if my understanding is correct:
> >>>
> >>> The result you saw is "rq_clock - sched_info->last_arrival == 87029229
> >>> - 67983031 == 19046198"
> >>> The expected result you want is: "rq_clock - sched_info->last_queued
> >>> == 87029229 - 87029224 == 5"
> >>>
> >>> You think the 19046198 value is misleading and should be 5 which only
> >>> contains the waiting in queue duration, am I correct?
> >>>
> >>> I agree with Kazu's idea, that we shouldn't change the existing ps
> >>> cmd's behaviour, and runq is a better alternative for the
> >>> waiting-in-queue duration display.
> >>>
> >>> What do you think? Could you please improve your code as well as an
> >>> updated "help runq" doc for runq?
> >>>
> >>> Thanks,
> >>> Tao Liu
> >>>
> >>>>>>
> >>>>>> We can trust sched_info->last_queued as it is only set when the
> task enters the CPU run queue.
> >>>>>> Furthermore, when the task hits/runs on a CPU or dequeues the CPU
> run queue, it will be reset to 0.
> >>>>>>
> >>>>>> Therefore, my idea is simple:
> >>>>>>
> >>>>>> If a task in RU stat and sched_info->last_queued has value (!= 0),
> >>>>>> it means this task is waiting in the run queue, use "rq_clock -
> sched_info->last_queued".
> >>>>>>
> >>>>>> Otherwise, if a task in RU stat and sched_info->last_queued = 0
> >>>>>> and sched_info->last_arrival has value (it must be), it means this
> task is running on the CPU,
> >>>>>> use "rq_clock - sched_info->last_arrival".
> >>>>>>
> >>>>>> Signed-off-by: Kenneth Yin <k...@redhat.com>
> >>>>>> ---
> >>>>>>     defs.h    |  1 +
> >>>>>>     symbols.c |  2 ++
> >>>>>>     task.c    | 21 +++++++++++++++------
> >>>>>>     3 files changed, 18 insertions(+), 6 deletions(-)
> >>>>>>
> >>>>>> diff --git a/defs.h b/defs.h
> >>>>>> index 4cf169c..66f5ce4 100644
> >>>>>> --- a/defs.h
> >>>>>> +++ b/defs.h
> >>>>>> @@ -1787,6 +1787,7 @@ struct offset_table {                    /*
> stash of commonly-used offsets */
> >>>>>>       long vcpu_struct_rq;
> >>>>>>       long task_struct_sched_info;
> >>>>>>       long sched_info_last_arrival;
> >>>>>> +    long sched_info_last_queued;
> >>>>>>       long page_objects;
> >>>>>>       long kmem_cache_oo;
> >>>>>>       long char_device_struct_cdev;
> >>>>>> diff --git a/symbols.c b/symbols.c
> >>>>>> index e30fafe..fb5035f 100644
> >>>>>> --- a/symbols.c
> >>>>>> +++ b/symbols.c
> >>>>>> @@ -9930,6 +9930,8 @@ dump_offset_table(char *spec, ulong
> makestruct)
> >>>>>>                     OFFSET(sched_rt_entity_run_list));
> >>>>>>       fprintf(fp, "       sched_info_last_arrival: %ld\n",
> >>>>>>                     OFFSET(sched_info_last_arrival));
> >>>>>> +    fprintf(fp, "       sched_info_last_queued: %ld\n",
> >>>>>> +            OFFSET(sched_info_last_queued));
> >>>>>>             fprintf(fp, "       task_struct_thread_info: %ld\n",
> >>>>>>                     OFFSET(task_struct_thread_info));
> >>>>>>             fprintf(fp, "             task_struct_stack: %ld\n",
> >>>>>> diff --git a/task.c b/task.c
> >>>>>> index 3bafe79..f5386ac 100644
> >>>>>> --- a/task.c
> >>>>>> +++ b/task.c
> >>>>>> @@ -332,9 +332,12 @@ task_init(void)
> >>>>>>             MEMBER_OFFSET_INIT(task_struct_last_run, "task_struct",
> "last_run");
> >>>>>>             MEMBER_OFFSET_INIT(task_struct_timestamp,
> "task_struct", "timestamp");
> >>>>>>             MEMBER_OFFSET_INIT(task_struct_sched_info,
> "task_struct", "sched_info");
> >>>>>> -    if (VALID_MEMBER(task_struct_sched_info))
> >>>>>> +    if (VALID_MEMBER(task_struct_sched_info)) {
> >>>>>>               MEMBER_OFFSET_INIT(sched_info_last_arrival,
> >>>>>>                       "sched_info", "last_arrival");
> >>>>>> +            MEMBER_OFFSET_INIT(sched_info_last_queued,
> >>>>>> +                    "sched_info", "last_queued");
> >>>>>> +    }
> >>>>>>       if (VALID_MEMBER(task_struct_last_run) ||
> >>>>>>           VALID_MEMBER(task_struct_timestamp) ||
> >>>>>>           VALID_MEMBER(sched_info_last_arrival)) {
> >>>>>> @@ -6035,7 +6038,7 @@ ulonglong
> >>>>>>     task_last_run(ulong task)
> >>>>>>     {
> >>>>>>             ulong last_run;
> >>>>>> -    ulonglong timestamp;
> >>>>>> +    ulonglong timestamp,last_queued;
> >>>>>>
> >>>>>>       timestamp = 0;
> >>>>>>             fill_task_struct(task);
> >>>>>> @@ -6047,10 +6050,16 @@ task_last_run(ulong task)
> >>>>>>       } else if (VALID_MEMBER(task_struct_timestamp))
> >>>>>>               timestamp = tt->last_task_read ?
> ULONGLONG(tt->task_struct +
> >>>>>>                       OFFSET(task_struct_timestamp)) : 0;
> >>>>>> -    else if (VALID_MEMBER(sched_info_last_arrival))
> >>>>>> -            timestamp = tt->last_task_read ?
> ULONGLONG(tt->task_struct +
> >>>>>> -                    OFFSET(task_struct_sched_info) +
> >>>>>> -                    OFFSET(sched_info_last_arrival)) : 0;
> >>>>>> +    else if (VALID_MEMBER(sched_info_last_queued))
> >>>>>> +            last_queued = ULONGLONG(tt->task_struct +
> >>>>>> +                    OFFSET(task_struct_sched_info) +
> >>>>>> +                    OFFSET(sched_info_last_queued));
> >>>>>> +            if (last_queued != 0) {
> >>>>>> +                    timestamp = tt->last_task_read ? last_queued :
> 0;
> >>>>>> +            } else if (VALID_MEMBER(sched_info_last_arrival))
> >>>>>> +                            timestamp = tt->last_task_read ?
> ULONGLONG(tt->task_struct +
> >>>>>> +                            OFFSET(task_struct_sched_info) +
> >>>>>> +                            OFFSET(sched_info_last_arrival)) : 0;
> >>>>>>
> >>>>>>             return timestamp;
> >>>>>>     }
> >>>>> --
> >>>>> Crash-utility mailing list -- devel@lists.crash-utility.osci.io
> >>>>> To unsubscribe send an email to
> devel-le...@lists.crash-utility.osci.io
> >>>>> https://${domain_name}/admin/lists/
> devel.lists.crash-utility.osci.io/
> >>>>> Contribution Guidelines: https://github.com/crash-utility/crash/wiki
> >>>> --
> >>>> Crash-utility mailing list -- devel@lists.crash-utility.osci.io
> >>>> To unsubscribe send an email to
> devel-le...@lists.crash-utility.osci.io
> >>>> https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
> >>>> Contribution Guidelines: https://github.com/crash-utility/crash/wiki
> >>>
> >>
> >>
> >> --
> >> Kenneth Yin
> >> Senior Software Maintenance Engineer
> >> Customer Experience and Engagement
> >> Phone: +86-10-6533-9459
> >> Red Hat China
>

--
Crash-utility mailing list -- devel@lists.crash-utility.osci.io
To unsubscribe send an email to devel-le...@lists.crash-utility.osci.io
https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
Contribution Guidelines: https://github.com/crash-utility/crash/wiki

[Crash-utility] Re: [PATCH] Fix the "ps -m" command shows wrong duration of RU task

Reply via email to