On Mon, May 26, 2025 at 5:04 PM <devel-requ...@lists.crash-utility.osci.io> wrote:
> Date: Fri, 23 May 2025 05:23:40 +0000 > From: HAGIO KAZUHITO(萩尾 一仁) <k-hagio...@nec.com> > Subject: [Crash-utility] Re: [PATCH] Fix the "ps -m" command shows > wrong duration of RU task > To: Tao Liu <l...@redhat.com>, Ke Yin <k...@redhat.com> > Cc: "devel@lists.crash-utility.osci.io" > <devel@lists.crash-utility.osci.io> > Message-ID: <23c74e07-d439-422a-bbea-8b2bf49b3...@nec.com> > Content-Type: text/plain; charset="utf-8" > > Hi, > > On 2025/05/15 8:43, Tao Liu wrote: > > Hi Ke Yin, > > > > On Wed, May 14, 2025 at 8:58 PM Ke Yin <k...@redhat.com> wrote: > >> > >> Hi Tao Liu & Kazu, > >> > >> Thanks for replying and sharing your thoughts. > >> > >> After a quick review of crash tool code, I found: > >> > >> runq -m will call dump_on_rq_milliseconds() to print the amount > >> of time that the active task on each cpu has been running, > >> but only for the current running task. > >> > >> runq -d will call dump_on_rq_tasks() to print all tasks in the run queue > >> and the task running on cpu without calling translate_nanoseconds(). > >> > >> My preliminary idea is to combine these two functions and add a new > >> parameter, for example -q, to print the tasks on each cpu that has > >> been waiting in the run queue only. And as well as update doc of runq. > >> > >> In short: > >> runq -q will call new_function which is the modified function based on > dump_on_rq_tasks() (skip current + translate_nanoseconds). > >> > >> What do you think? > > I didn't know the "runq -d" option because it's a kind of debugging > option and has no description in the help page. Also it searches all > tasks for ones that have on_rq = 1 and doesn't look very efficient > (nr_tasks * nr_cpus). so ideally, maybe a new function should be based > on dump_runq() than based on dump_on_rq_tasks(), if possible.. > > Looks like getting a solution: adding a new option(E.g: runq -q) to achieve this purpose? Or need more discussion? Any update? Thanks Lianbo > Thanks, > Kazu > > > > > > I'm OK with your idea in general. Please check if I understood > > correctly, your implementation is like: > > cmd_runq() { > > ... > > if (-d option) { > > dump_on_rq_tasks(old path); > > } else if (-q option) { > > dump_on_rq_tasks(new path); > > } > > } > > > > dump_on_rq_tasks(option) > > { > > ... > > for (i = 0; i < RUNNING_TASKS(); i++, tc++) { > > if (old path) // Old path stay unchanged > > dump_task_runq_entry(tc, 0); > > else // New path will output your time duration > > your_new_function_with_translate_nanoseconds(); > > } > > } > > > > Thanks, > > Tao Liu > > > >> > >> Thanks > >> Kenneth Yin > >> > >> > >> > >> > >> On Mon, May 12, 2025 at 1:36 PM Tao Liu <l...@redhat.com> wrote: > >>> > >>> Hi Kazu & Kenneth, > >>> > >>> Sorry for the late reply, and thanks for your fix and comments! > >>> > >>> On Thu, May 8, 2025 at 12:20 PM HAGIO KAZUHITO(萩尾 一仁) > >>> <k-hagio...@nec.com> wrote: > >>>> > >>>> On 2025/05/07 16:16, HAGIO KAZUHITO(萩尾 一仁) wrote: > >>>>> Hi, > >>>>> > >>>>> On 2025/04/28 19:38, Kenneth Yin wrote: > >>>>>> The RU/TASK_RUNNING stat means the task is runnable. > >>>>>> It is either currently running or on a run queue waiting to run. > >>>>>> > >>>>>> Currently, the crash tool uses the "rq_clock - > sched_info->last_arrival" formula to > >>>>>> calculate the duration of task in RU state. This is for the > scenario of a task running on a CPU. > >>>>> > >>>>> The "ps -l" and "ps -m" options display what their help text > describes, > >>>>> not the duration of task in RU state. Please see "help ps". > >>>>> > >>>>> Also, tasks are sorted by the value, using different values for it > could > >>>>> make another confusion. > >>>>> > >>>>> The options have been used for a long time with the current code, if > we > >>>>> change the semantics of the options, it would be better to be > careful. > >>>>> The change might lose a kind of information instead of getting > another > >>>>> kind of information. > >>>>> > >>>>> On the other hand, I think that the duration of waiting in queue > might > >>>>> also be useful information. I'm not sure how we should display them, > >>>>> but for example, how about adding a new option or adding a column for > >>>>> last_queued? > >>>> > >>>> I thought of that the "runq" command might be suitable to display the > >>>> waiting duration, because only tasks in the run queues have it. For > >>>> example, extending the "runq -m" option or adding a new option. just > my > >>>> thought. > >>>> > >>>> Thanks, > >>>> Kazu > >>>> > >>>>> > >>>>> What do you think, folks? > >>>>> > >>>>> Thanks, > >>>>> Kazu > >>>>> > >>>>>> > >>>>>> But for the scenario of a task waiting in the CPU run queue (due > to some reason > >>>>>> for example cfs/rt queue throttled), this formula could cause > misunderstanding. > >>>>>> > >>>>>> For example: > >>>>>> [ 220 10:36:38.026] [RU] PID: 12345 TASK: ffff8d674ab6b180 CPU: > 1 COMMAND: "task" > >>>>>> > >>>>>> Looking closer: > >>>>>> > >>>>>> crash> rq.clock ffff8de438a5acc0 > >>>>>> clock = 87029229985307234, > >>>>>> > >>>>>> crash> task -R sched_info,se.exec_start > >>>>>> PID: 12345 TASK: ffff8d674ab6b180 CPU: 1 COMMAND: "task" > >>>>>> sched_info = { > >>>>>> pcount = 33, > >>>>>> run_delay = 0, > >>>>>> last_arrival = 67983031958439673, > >>>>>> last_queued = 87029224561119369 > >>>>>> }, > >>>>>> se.exec_start = 67983031958476937, > >>>>>> > >>>>>> 67983031 67983031 87029224 > 87029229 > >>>>>> |<- running on CPU ->| <- IN ->|<- waiting in queue > ->| > >>>>>> > >>>>>> For this scenario, the "task" was waiting in the run queue of the > CPU only for 5 seconds, > >>>>>> we should use the "rq_clock - sched_info->last_queued" formula. > >>> > >>> Please check if my understanding is correct: > >>> > >>> The result you saw is "rq_clock - sched_info->last_arrival == 87029229 > >>> - 67983031 == 19046198" > >>> The expected result you want is: "rq_clock - sched_info->last_queued > >>> == 87029229 - 87029224 == 5" > >>> > >>> You think the 19046198 value is misleading and should be 5 which only > >>> contains the waiting in queue duration, am I correct? > >>> > >>> I agree with Kazu's idea, that we shouldn't change the existing ps > >>> cmd's behaviour, and runq is a better alternative for the > >>> waiting-in-queue duration display. > >>> > >>> What do you think? Could you please improve your code as well as an > >>> updated "help runq" doc for runq? > >>> > >>> Thanks, > >>> Tao Liu > >>> > >>>>>> > >>>>>> We can trust sched_info->last_queued as it is only set when the > task enters the CPU run queue. > >>>>>> Furthermore, when the task hits/runs on a CPU or dequeues the CPU > run queue, it will be reset to 0. > >>>>>> > >>>>>> Therefore, my idea is simple: > >>>>>> > >>>>>> If a task in RU stat and sched_info->last_queued has value (!= 0), > >>>>>> it means this task is waiting in the run queue, use "rq_clock - > sched_info->last_queued". > >>>>>> > >>>>>> Otherwise, if a task in RU stat and sched_info->last_queued = 0 > >>>>>> and sched_info->last_arrival has value (it must be), it means this > task is running on the CPU, > >>>>>> use "rq_clock - sched_info->last_arrival". > >>>>>> > >>>>>> Signed-off-by: Kenneth Yin <k...@redhat.com> > >>>>>> --- > >>>>>> defs.h | 1 + > >>>>>> symbols.c | 2 ++ > >>>>>> task.c | 21 +++++++++++++++------ > >>>>>> 3 files changed, 18 insertions(+), 6 deletions(-) > >>>>>> > >>>>>> diff --git a/defs.h b/defs.h > >>>>>> index 4cf169c..66f5ce4 100644 > >>>>>> --- a/defs.h > >>>>>> +++ b/defs.h > >>>>>> @@ -1787,6 +1787,7 @@ struct offset_table { /* > stash of commonly-used offsets */ > >>>>>> long vcpu_struct_rq; > >>>>>> long task_struct_sched_info; > >>>>>> long sched_info_last_arrival; > >>>>>> + long sched_info_last_queued; > >>>>>> long page_objects; > >>>>>> long kmem_cache_oo; > >>>>>> long char_device_struct_cdev; > >>>>>> diff --git a/symbols.c b/symbols.c > >>>>>> index e30fafe..fb5035f 100644 > >>>>>> --- a/symbols.c > >>>>>> +++ b/symbols.c > >>>>>> @@ -9930,6 +9930,8 @@ dump_offset_table(char *spec, ulong > makestruct) > >>>>>> OFFSET(sched_rt_entity_run_list)); > >>>>>> fprintf(fp, " sched_info_last_arrival: %ld\n", > >>>>>> OFFSET(sched_info_last_arrival)); > >>>>>> + fprintf(fp, " sched_info_last_queued: %ld\n", > >>>>>> + OFFSET(sched_info_last_queued)); > >>>>>> fprintf(fp, " task_struct_thread_info: %ld\n", > >>>>>> OFFSET(task_struct_thread_info)); > >>>>>> fprintf(fp, " task_struct_stack: %ld\n", > >>>>>> diff --git a/task.c b/task.c > >>>>>> index 3bafe79..f5386ac 100644 > >>>>>> --- a/task.c > >>>>>> +++ b/task.c > >>>>>> @@ -332,9 +332,12 @@ task_init(void) > >>>>>> MEMBER_OFFSET_INIT(task_struct_last_run, "task_struct", > "last_run"); > >>>>>> MEMBER_OFFSET_INIT(task_struct_timestamp, > "task_struct", "timestamp"); > >>>>>> MEMBER_OFFSET_INIT(task_struct_sched_info, > "task_struct", "sched_info"); > >>>>>> - if (VALID_MEMBER(task_struct_sched_info)) > >>>>>> + if (VALID_MEMBER(task_struct_sched_info)) { > >>>>>> MEMBER_OFFSET_INIT(sched_info_last_arrival, > >>>>>> "sched_info", "last_arrival"); > >>>>>> + MEMBER_OFFSET_INIT(sched_info_last_queued, > >>>>>> + "sched_info", "last_queued"); > >>>>>> + } > >>>>>> if (VALID_MEMBER(task_struct_last_run) || > >>>>>> VALID_MEMBER(task_struct_timestamp) || > >>>>>> VALID_MEMBER(sched_info_last_arrival)) { > >>>>>> @@ -6035,7 +6038,7 @@ ulonglong > >>>>>> task_last_run(ulong task) > >>>>>> { > >>>>>> ulong last_run; > >>>>>> - ulonglong timestamp; > >>>>>> + ulonglong timestamp,last_queued; > >>>>>> > >>>>>> timestamp = 0; > >>>>>> fill_task_struct(task); > >>>>>> @@ -6047,10 +6050,16 @@ task_last_run(ulong task) > >>>>>> } else if (VALID_MEMBER(task_struct_timestamp)) > >>>>>> timestamp = tt->last_task_read ? > ULONGLONG(tt->task_struct + > >>>>>> OFFSET(task_struct_timestamp)) : 0; > >>>>>> - else if (VALID_MEMBER(sched_info_last_arrival)) > >>>>>> - timestamp = tt->last_task_read ? > ULONGLONG(tt->task_struct + > >>>>>> - OFFSET(task_struct_sched_info) + > >>>>>> - OFFSET(sched_info_last_arrival)) : 0; > >>>>>> + else if (VALID_MEMBER(sched_info_last_queued)) > >>>>>> + last_queued = ULONGLONG(tt->task_struct + > >>>>>> + OFFSET(task_struct_sched_info) + > >>>>>> + OFFSET(sched_info_last_queued)); > >>>>>> + if (last_queued != 0) { > >>>>>> + timestamp = tt->last_task_read ? last_queued : > 0; > >>>>>> + } else if (VALID_MEMBER(sched_info_last_arrival)) > >>>>>> + timestamp = tt->last_task_read ? > ULONGLONG(tt->task_struct + > >>>>>> + OFFSET(task_struct_sched_info) + > >>>>>> + OFFSET(sched_info_last_arrival)) : 0; > >>>>>> > >>>>>> return timestamp; > >>>>>> } > >>>>> -- > >>>>> Crash-utility mailing list -- devel@lists.crash-utility.osci.io > >>>>> To unsubscribe send an email to > devel-le...@lists.crash-utility.osci.io > >>>>> https://${domain_name}/admin/lists/ > devel.lists.crash-utility.osci.io/ > >>>>> Contribution Guidelines: https://github.com/crash-utility/crash/wiki > >>>> -- > >>>> Crash-utility mailing list -- devel@lists.crash-utility.osci.io > >>>> To unsubscribe send an email to > devel-le...@lists.crash-utility.osci.io > >>>> https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/ > >>>> Contribution Guidelines: https://github.com/crash-utility/crash/wiki > >>> > >> > >> > >> -- > >> Kenneth Yin > >> Senior Software Maintenance Engineer > >> Customer Experience and Engagement > >> Phone: +86-10-6533-9459 > >> Red Hat China >
-- Crash-utility mailing list -- devel@lists.crash-utility.osci.io To unsubscribe send an email to devel-le...@lists.crash-utility.osci.io https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/ Contribution Guidelines: https://github.com/crash-utility/crash/wiki