[Crash-utility] Re: [PATCH] Fix the "ps -m" command shows wrong duration of RU task

Ke Yin Mon, 09 Jun 2025 02:16:51 -0700

Hi Lianbo,

It is waiting on me, I am planning to follow Kazu's idea to add a new
option, for example, runq -q.


On Mon, Jun 9, 2025 at 5:09 PM lijiang <liji...@redhat.com> wrote:

> On Mon, May 26, 2025 at 5:04 PM <devel-requ...@lists.crash-utility.osci.io>
> wrote:
>
>> Date: Fri, 23 May 2025 05:23:40 +0000
>> From: HAGIO KAZUHITO(萩尾 一仁) <k-hagio...@nec.com>
>> Subject: [Crash-utility] Re: [PATCH] Fix the "ps -m" command shows
>>         wrong duration of RU task
>> To: Tao Liu <l...@redhat.com>, Ke Yin <k...@redhat.com>
>> Cc: "devel@lists.crash-utility.osci.io"
>>         <devel@lists.crash-utility.osci.io>
>> Message-ID: <23c74e07-d439-422a-bbea-8b2bf49b3...@nec.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>> Hi,
>>
>> On 2025/05/15 8:43, Tao Liu wrote:
>> > Hi Ke Yin,
>> >
>> > On Wed, May 14, 2025 at 8:58 PM Ke Yin <k...@redhat.com> wrote:
>> >>
>> >> Hi Tao Liu & Kazu,
>> >>
>> >> Thanks for replying and sharing your thoughts.
>> >>
>> >> After a quick review of crash tool code, I found:
>> >>
>> >> runq -m will call dump_on_rq_milliseconds() to print the amount
>> >> of time that the active task on each cpu has been running,
>> >> but only for the current running task.
>> >>
>> >> runq -d will call dump_on_rq_tasks() to print all tasks in the run
>> queue
>> >> and the task running on cpu without calling translate_nanoseconds().
>> >>
>> >> My preliminary idea is to combine these two functions and add a new
>> >> parameter, for example -q, to print the tasks on each cpu that has
>> >> been waiting in the run queue only. And as well as update doc of runq.
>> >>
>> >> In short:
>> >> runq -q will call new_function which is the modified function based on
>> dump_on_rq_tasks() (skip current + translate_nanoseconds).
>> >>
>> >> What do you think?
>>
>> I didn't know the "runq -d" option because it's a kind of debugging
>> option and has no description in the help page.  Also it searches all
>> tasks for ones that have on_rq = 1 and doesn't look very efficient
>> (nr_tasks * nr_cpus).  so ideally, maybe a new function should be based
>> on dump_runq() than based on dump_on_rq_tasks(), if possible..
>>
>>
> Looks like getting a solution: adding a new option(E.g: runq -q) to
> achieve this purpose? Or need more discussion? Any update?
>
> Thanks
> Lianbo
>
>
>> Thanks,
>> Kazu
>>
>>
>> >
>> > I'm OK with your idea in general. Please check if I understood
>> > correctly, your implementation is like:
>> > cmd_runq() {
>> > ...
>> >    if (-d option) {
>> >      dump_on_rq_tasks(old path);
>> >    } else if (-q option) {
>> >      dump_on_rq_tasks(new path);
>> >    }
>> > }
>> >
>> > dump_on_rq_tasks(option)
>> > {
>> >    ...
>> >    for (i = 0; i < RUNNING_TASKS(); i++, tc++) {
>> >      if (old path) // Old path stay unchanged
>> >        dump_task_runq_entry(tc, 0);
>> >      else // New path will output your time duration
>> >        your_new_function_with_translate_nanoseconds();
>> >    }
>> > }
>> >
>> > Thanks,
>> > Tao Liu
>> >
>> >>
>> >> Thanks
>> >> Kenneth Yin
>> >>
>> >>
>> >>
>> >>
>> >> On Mon, May 12, 2025 at 1:36 PM Tao Liu <l...@redhat.com> wrote:
>> >>>
>> >>> Hi Kazu & Kenneth,
>> >>>
>> >>> Sorry for the late reply, and thanks for your fix and comments!
>> >>>
>> >>> On Thu, May 8, 2025 at 12:20 PM HAGIO KAZUHITO(萩尾 一仁)
>> >>> <k-hagio...@nec.com> wrote:
>> >>>>
>> >>>> On 2025/05/07 16:16, HAGIO KAZUHITO(萩尾 一仁) wrote:
>> >>>>> Hi,
>> >>>>>
>> >>>>> On 2025/04/28 19:38, Kenneth Yin wrote:
>> >>>>>> The RU/TASK_RUNNING stat means the task is runnable.
>> >>>>>> It is either currently running or on a run queue waiting to run.
>> >>>>>>
>> >>>>>> Currently, the crash tool uses the "rq_clock -
>> sched_info->last_arrival" formula to
>> >>>>>> calculate the duration of task in RU state. This is for the
>> scenario of a task running on a CPU.
>> >>>>>
>> >>>>> The "ps -l" and "ps -m" options display what their help text
>> describes,
>> >>>>> not the duration of task in RU state.  Please see "help ps".
>> >>>>>
>> >>>>> Also, tasks are sorted by the value, using different values for it
>> could
>> >>>>> make another confusion.
>> >>>>>
>> >>>>> The options have been used for a long time with the current code,
>> if we
>> >>>>> change the semantics of the options, it would be better to be
>> careful.
>> >>>>> The change might lose a kind of information instead of getting
>> another
>> >>>>> kind of information.
>> >>>>>
>> >>>>> On the other hand, I think that the duration of waiting in queue
>> might
>> >>>>> also be useful information.  I'm not sure how we should display
>> them,
>> >>>>> but for example, how about adding a new option or adding a column
>> for
>> >>>>> last_queued?
>> >>>>
>> >>>> I thought of that the "runq" command might be suitable to display the
>> >>>> waiting duration, because only tasks in the run queues have it.  For
>> >>>> example, extending the "runq -m" option or adding a new option.
>> just my
>> >>>> thought.
>> >>>>
>> >>>> Thanks,
>> >>>> Kazu
>> >>>>
>> >>>>>
>> >>>>> What do you think, folks?
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Kazu
>> >>>>>
>> >>>>>>
>> >>>>>> But for the scenario of a task waiting in the  CPU run queue (due
>> to some reason
>> >>>>>> for example cfs/rt queue throttled), this formula could cause
>> misunderstanding.
>> >>>>>>
>> >>>>>> For example:
>> >>>>>> [ 220 10:36:38.026] [RU]  PID: 12345   TASK: ffff8d674ab6b180
>> CPU: 1        COMMAND: "task"
>> >>>>>>
>> >>>>>> Looking closer:
>> >>>>>>
>> >>>>>> crash> rq.clock ffff8de438a5acc0
>> >>>>>>      clock = 87029229985307234,
>> >>>>>>
>> >>>>>> crash> task -R sched_info,se.exec_start
>> >>>>>> PID: 12345   TASK: ffff8d674ab6b180  CPU: 1  COMMAND: "task"
>> >>>>>>      sched_info = {
>> >>>>>>       pcount = 33,
>> >>>>>>       run_delay = 0,
>> >>>>>>       last_arrival = 67983031958439673,
>> >>>>>>       last_queued = 87029224561119369
>> >>>>>>      },
>> >>>>>>      se.exec_start = 67983031958476937,
>> >>>>>>
>> >>>>>> 67983031         67983031                 87029224
>> 87029229
>> >>>>>> |<-   running on CPU  ->| <-      IN    ->|<-    waiting in queue
>>   ->|
>> >>>>>>
>> >>>>>> For this scenario, the "task" was waiting in the run queue of the
>> CPU only for 5 seconds,
>> >>>>>> we should use the "rq_clock - sched_info->last_queued" formula.
>> >>>
>> >>> Please check if my understanding is correct:
>> >>>
>> >>> The result you saw is "rq_clock - sched_info->last_arrival == 87029229
>> >>> - 67983031 == 19046198"
>> >>> The expected result you want is: "rq_clock - sched_info->last_queued
>> >>> == 87029229 - 87029224 == 5"
>> >>>
>> >>> You think the 19046198 value is misleading and should be 5 which only
>> >>> contains the waiting in queue duration, am I correct?
>> >>>
>> >>> I agree with Kazu's idea, that we shouldn't change the existing ps
>> >>> cmd's behaviour, and runq is a better alternative for the
>> >>> waiting-in-queue duration display.
>> >>>
>> >>> What do you think? Could you please improve your code as well as an
>> >>> updated "help runq" doc for runq?
>> >>>
>> >>> Thanks,
>> >>> Tao Liu
>> >>>
>> >>>>>>
>> >>>>>> We can trust sched_info->last_queued as it is only set when the
>> task enters the CPU run queue.
>> >>>>>> Furthermore, when the task hits/runs on a CPU or dequeues the CPU
>> run queue, it will be reset to 0.
>> >>>>>>
>> >>>>>> Therefore, my idea is simple:
>> >>>>>>
>> >>>>>> If a task in RU stat and sched_info->last_queued has value (!= 0),
>> >>>>>> it means this task is waiting in the run queue, use "rq_clock -
>> sched_info->last_queued".
>> >>>>>>
>> >>>>>> Otherwise, if a task in RU stat and sched_info->last_queued = 0
>> >>>>>> and sched_info->last_arrival has value (it must be), it means this
>> task is running on the CPU,
>> >>>>>> use "rq_clock - sched_info->last_arrival".
>> >>>>>>
>> >>>>>> Signed-off-by: Kenneth Yin <k...@redhat.com>
>> >>>>>> ---
>> >>>>>>     defs.h    |  1 +
>> >>>>>>     symbols.c |  2 ++
>> >>>>>>     task.c    | 21 +++++++++++++++------
>> >>>>>>     3 files changed, 18 insertions(+), 6 deletions(-)
>> >>>>>>
>> >>>>>> diff --git a/defs.h b/defs.h
>> >>>>>> index 4cf169c..66f5ce4 100644
>> >>>>>> --- a/defs.h
>> >>>>>> +++ b/defs.h
>> >>>>>> @@ -1787,6 +1787,7 @@ struct offset_table {                    /*
>> stash of commonly-used offsets */
>> >>>>>>       long vcpu_struct_rq;
>> >>>>>>       long task_struct_sched_info;
>> >>>>>>       long sched_info_last_arrival;
>> >>>>>> +    long sched_info_last_queued;
>> >>>>>>       long page_objects;
>> >>>>>>       long kmem_cache_oo;
>> >>>>>>       long char_device_struct_cdev;
>> >>>>>> diff --git a/symbols.c b/symbols.c
>> >>>>>> index e30fafe..fb5035f 100644
>> >>>>>> --- a/symbols.c
>> >>>>>> +++ b/symbols.c
>> >>>>>> @@ -9930,6 +9930,8 @@ dump_offset_table(char *spec, ulong
>> makestruct)
>> >>>>>>                     OFFSET(sched_rt_entity_run_list));
>> >>>>>>       fprintf(fp, "       sched_info_last_arrival: %ld\n",
>> >>>>>>                     OFFSET(sched_info_last_arrival));
>> >>>>>> +    fprintf(fp, "       sched_info_last_queued: %ld\n",
>> >>>>>> +            OFFSET(sched_info_last_queued));
>> >>>>>>             fprintf(fp, "       task_struct_thread_info: %ld\n",
>> >>>>>>                     OFFSET(task_struct_thread_info));
>> >>>>>>             fprintf(fp, "             task_struct_stack: %ld\n",
>> >>>>>> diff --git a/task.c b/task.c
>> >>>>>> index 3bafe79..f5386ac 100644
>> >>>>>> --- a/task.c
>> >>>>>> +++ b/task.c
>> >>>>>> @@ -332,9 +332,12 @@ task_init(void)
>> >>>>>>             MEMBER_OFFSET_INIT(task_struct_last_run,
>> "task_struct", "last_run");
>> >>>>>>             MEMBER_OFFSET_INIT(task_struct_timestamp,
>> "task_struct", "timestamp");
>> >>>>>>             MEMBER_OFFSET_INIT(task_struct_sched_info,
>> "task_struct", "sched_info");
>> >>>>>> -    if (VALID_MEMBER(task_struct_sched_info))
>> >>>>>> +    if (VALID_MEMBER(task_struct_sched_info)) {
>> >>>>>>               MEMBER_OFFSET_INIT(sched_info_last_arrival,
>> >>>>>>                       "sched_info", "last_arrival");
>> >>>>>> +            MEMBER_OFFSET_INIT(sched_info_last_queued,
>> >>>>>> +                    "sched_info", "last_queued");
>> >>>>>> +    }
>> >>>>>>       if (VALID_MEMBER(task_struct_last_run) ||
>> >>>>>>           VALID_MEMBER(task_struct_timestamp) ||
>> >>>>>>           VALID_MEMBER(sched_info_last_arrival)) {
>> >>>>>> @@ -6035,7 +6038,7 @@ ulonglong
>> >>>>>>     task_last_run(ulong task)
>> >>>>>>     {
>> >>>>>>             ulong last_run;
>> >>>>>> -    ulonglong timestamp;
>> >>>>>> +    ulonglong timestamp,last_queued;
>> >>>>>>
>> >>>>>>       timestamp = 0;
>> >>>>>>             fill_task_struct(task);
>> >>>>>> @@ -6047,10 +6050,16 @@ task_last_run(ulong task)
>> >>>>>>       } else if (VALID_MEMBER(task_struct_timestamp))
>> >>>>>>               timestamp = tt->last_task_read ?
>> ULONGLONG(tt->task_struct +
>> >>>>>>                       OFFSET(task_struct_timestamp)) : 0;
>> >>>>>> -    else if (VALID_MEMBER(sched_info_last_arrival))
>> >>>>>> -            timestamp = tt->last_task_read ?
>> ULONGLONG(tt->task_struct +
>> >>>>>> -                    OFFSET(task_struct_sched_info) +
>> >>>>>> -                    OFFSET(sched_info_last_arrival)) : 0;
>> >>>>>> +    else if (VALID_MEMBER(sched_info_last_queued))
>> >>>>>> +            last_queued = ULONGLONG(tt->task_struct +
>> >>>>>> +                    OFFSET(task_struct_sched_info) +
>> >>>>>> +                    OFFSET(sched_info_last_queued));
>> >>>>>> +            if (last_queued != 0) {
>> >>>>>> +                    timestamp = tt->last_task_read ? last_queued
>> : 0;
>> >>>>>> +            } else if (VALID_MEMBER(sched_info_last_arrival))
>> >>>>>> +                            timestamp = tt->last_task_read ?
>> ULONGLONG(tt->task_struct +
>> >>>>>> +                            OFFSET(task_struct_sched_info) +
>> >>>>>> +                            OFFSET(sched_info_last_arrival)) : 0;
>> >>>>>>
>> >>>>>>             return timestamp;
>> >>>>>>     }
>> >>>>> --
>> >>>>> Crash-utility mailing list -- devel@lists.crash-utility.osci.io
>> >>>>> To unsubscribe send an email to
>> devel-le...@lists.crash-utility.osci.io
>> >>>>> https://${domain_name}/admin/lists/
>> devel.lists.crash-utility.osci.io/
>> >>>>> Contribution Guidelines:
>> https://github.com/crash-utility/crash/wiki
>> >>>> --
>> >>>> Crash-utility mailing list -- devel@lists.crash-utility.osci.io
>> >>>> To unsubscribe send an email to
>> devel-le...@lists.crash-utility.osci.io
>> >>>> https://${domain_name}/admin/lists/
>> devel.lists.crash-utility.osci.io/
>> >>>> Contribution Guidelines: https://github.com/crash-utility/crash/wiki
>> >>>
>> >>
>> >>
>> >> --
>> >> Kenneth Yin
>> >> Senior Software Maintenance Engineer
>> >> Customer Experience and Engagement
>> >> Phone: +86-10-6533-9459
>> >> Red Hat China
>>
>

-- 
Kenneth Yin
Senior Software Maintenance Engineer
Customer Experience and Engagement
Phone: +86-10-6533-9459
Red Hat China

--
Crash-utility mailing list -- devel@lists.crash-utility.osci.io
To unsubscribe send an email to devel-le...@lists.crash-utility.osci.io
https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
Contribution Guidelines: https://github.com/crash-utility/crash/wiki

[Crash-utility] Re: [PATCH] Fix the "ps -m" command shows wrong duration of RU task

Reply via email to