On Fri, 25 May 2018 04:56:25 +0200 Frederic Weisbecker <[email protected]> wrote:
> On Tue, May 22, 2018 at 10:10:19PM +0300, Yauheni Kaliuta wrote: > > Hi, Frederic! > > > > >>>>> On Mon, 29 Jan 2018 02:10:26 +0100, Frederic Weisbecker wrote: > > > On Wed, Jan 24, 2018 at 10:46:08AM -0500, Luiz Capitulino wrote: > > > > [...] > > > > >> Since the 1Hz tick offload worked for you, I must be missing > > >> a way to disable this timer or the kernel is thinking my CPU > > >> has unstable TSC (which it doesn't AFAIK). > > > > > It's beyond the scope of this patchset but indeed that's > > > right, I run my kernels with tsc=reliable because my CPUs > > > don't have the TSC_RELIABLE flag. That's the only way I found > > > to shutdown the tick completely on my test machine, otherwise > > > I keep having that clocksource watchdog. > > > > [...] > > > > Thanks, it helps. But I have accounting problem: > > > > if I run user busy loop on the nohz cpu, the task accounting works > > correctly (top shows the task takes 100% cpu), but cpu accounting is > > wrong (cpu is 100% idle, in the per-core view as well). > > > > If I understand correctly, the stats are updated by account_user_time() > > -> task_group_account_field() but there is no call for it in case of > > offloading (it is called from irqtime_account_process_tick, > > account_process_tick, vtime_user_exit). > > Ah I forgot about kcpustat accounting. I remember I wanted to fix that a > few years ago but I forgot about it when I removed the last tick. That > thing was lurking behind 1Hz. > > > > > Moreover, task_group_account_field() uses __this_cpu_add() which will be > > wrong for offloading. > > > > For testing I used kcpustat_cpu(task_cpu(p)) in > > task_group_account_field() and added call account_user_time(curr, delta) > > to the sched_tick_remote() what fixes it for me, but what would be the > > proper fix? > > Yeah unfortunately that's unsafe. Task accounting is not designed for remote > update. You could race with an update from another CPU, especially the local > updater. > > I fear we need to take the same approach than task cputime, which is using a > seqcount > for updates. Then the reader would fetch the kcpustat values + the delta > vtime from the task executing. > > Things can get complicated once we dive into corner cases: CPUTIME_IRQ, > CPUTIME_SOFTIRQ, and CPUTIME_STEAL. At least we don't need to care about > CPUTIME_IDLE > and CPUTIME_IOWAIT that have their own delta. > > I'm trying that. Cool! Needless to say, but we can help testing once you have patches.

