On Wed, Jan 21, 2026 at 07:04:35PM +0100, Frederic Weisbecker wrote: > BTW here is a question for you, does the timer (as in get_cpu_timer()) still > decrements while in idle? I would assume not, given how lc->system_timer > is updated in account_idle_time_irq().
It is not decremented while in idle (or when the hypervisor schedules the virtual cpu away). We use the fact that the cpu timer is not decremented when the virtual cpu is not running vs the real time-of-day clock to calculate steal time. > And another question in this same function is this : > > lc->steal_timer += idle->clock_idle_enter - lc->last_update_clock; > > clock_idle_enter is updated right before halting the CPU. But when was > last_update_clock updated last? Could be either task switch to idle, or > a previous idle tick interrupt or a previous idle IRQ entry. In any case > I'm not sure the difference is meaningful as steal time. > > I must be missing something. "It has been like that forever" :) However I do agree that this doesn't seem to make any sense. At least with the current implementation I cannot see how that makes sense, since the difference of two time stamps, which do not include any steal time are added. Maybe it broke by some of all the changes over the years, or it was always wrong, or I am missing something too. Will investigate and address it if required. Thank you for bringing this up! > > Not sure what to do with this. I thought about removing those sysfs files > > already in the past, since they are of very limited use; and most likely > > nothing in user space would miss them. > > Perhaps but this file is a good comparison point against /proc/stat because > s390 vtime is much closer to measuring the actual CPU halted time than what > the generic nohz accounting does (which includes more idle code execution). Yes, while comparing those files I also see an unexpected difference of several seconds after two days of uptime; that is before your changes. In theory the sum of idle and iowait in /proc/stat should be the same like the per-cpu idle_time_us sysfs file. But there is a difference, which shouldn't be there as far as I can tell. Yet another thing to look into. > > Guess I need to spend some more time on accounting and see what it would > > take > > to convert to VIRT_CPU_ACCOUNTING_GEN, while keeping the current precision > > and > > functionality. > > I would expect more overhead with VIRT_CPU_ACCOUNTING_GEN, though that has yet > to be measured. In any case you'll lose some idle cputime precision (but > you need to read that through s390 sysfs files) if what we want to measure > here is the actual halted time. > > Perhaps we could enhance VIRT_CPU_ACCOUNTING_GEN and nohz idle cputime > accounting to match s390 precision. Though I expect some cost > accessing the clock inevitably more often on some machines. Let me experiment with that, but first I want to understand the oddities pointed out above.
