On Mon, 2016-06-27 at 14:25 +0200, Frederic Weisbecker wrote:
> On Wed, Jun 22, 2016 at 10:25:47PM -0400, [email protected] wrote:
> > 
> > From: Rik van Riel <[email protected]>
> > 
> > Currently, if there was any irq or softirq time during 'ticks'
> > jiffies, the entire period will be accounted as irq or softirq
> > time.
> > 
> > This is inaccurate if only a subset of 'ticks' jiffies was
> > actually spent handling irqs, and could conceivably mis-count
> > all of the ticks during a period as irq time, when there was
> > some irq and some softirq time.
> Good catch!
> 
> Many comments following.
> 
> > 
> > 
> > This can actually happen when irqtime_account_process_tick
> > is called from account_idle_ticks, which can pass a larger
> > number of ticks down all at once.
> > 
> > Fix this by changing irqtime_account_hi_update and
> > irqtime_account_si_update to round elapsed irq and softirq
> > time to jiffies, and return the number of jiffies spent in
> > each mode, similar to how steal time is handled.
> > 
> > Additionally, have irqtime_account_process_tick take into
> > account how much time was spent in each of steal, irq,
> > and softirq time.
> > 
> > The latter could help improve the accuracy of timekeeping
> Maybe you meant cputime? Timekeeping is rather about jiffies and
> GTOD.
> 
> > 
> > when returning from idle on a NO_HZ_IDLE CPU.
> > 
> > Properly accounting how much time was spent in hardirq and
> > softirq time will also allow the NO_HZ_FULL code to re-use
> > these same functions for hardirq and softirq accounting.
> > 
> > Signed-off-by: Rik van Riel <[email protected]>
> > 
> >     local_irq_save(flags);
> > -   latest_ns = this_cpu_read(cpu_hardirq_time);
> > -   if (nsecs_to_cputime64(latest_ns) > cpustat[CPUTIME_IRQ])
> > -           ret = 1;
> > +   irq = this_cpu_read(cpu_hardirq_time) -
> > cpustat[CPUTIME_IRQ];
> cpu_hardirq_time is made of nsecs whereas cpustat is of cputime_t (in
> fact
> even cputime64_t). So you need to convert cpu_hardirq_time before
> doing the
> substract.

Doh. Good catch!

> > -static int irqtime_account_si_update(void)
> > +static unsigned long irqtime_account_si_update(unsigned long
> > max_jiffies)
> >  {
> >     u64 *cpustat = kcpustat_this_cpu->cpustat;
> > +   unsigned long si_jiffies = 0;
> >     unsigned long flags;
> > -   u64 latest_ns;
> > -   int ret = 0;
> > +   u64 softirq;
> >  
> >     local_irq_save(flags);
> > -   latest_ns = this_cpu_read(cpu_softirq_time);
> > -   if (nsecs_to_cputime64(latest_ns) >
> > cpustat[CPUTIME_SOFTIRQ])
> > -           ret = 1;
> > +   softirq = this_cpu_read(cpu_softirq_time) -
> > cpustat[CPUTIME_SOFTIRQ];
> > +   if (softirq > cputime_one_jiffy) {
> > +           si_jiffies = min(max_jiffies,
> > cputime_to_jiffies(softirq));
> > +           cpustat[CPUTIME_SOFTIRQ] +=
> > jiffies_to_cputime(si_jiffies);
> > +   }
> >     local_irq_restore(flags);
> > -   return ret;
> > +   return si_jiffies;
> So same comments apply here.
> 
> [...]
> > 
> >   * Accumulate raw cputime values of dead tasks (sig->[us]time) and
> > live
> >   * tasks (sum on group iteration) belonging to @tsk's group.
> >   */
> > @@ -344,19 +378,24 @@ static void
> > irqtime_account_process_tick(struct task_struct *p, int user_tick,
> >  {
> >     cputime_t scaled = cputime_to_scaled(cputime_one_jiffy);
> >     u64 cputime = (__force u64) cputime_one_jiffy;
> > -   u64 *cpustat = kcpustat_this_cpu->cpustat;
> > +   unsigned long other;
> >  
> > -   if (steal_account_process_tick(ULONG_MAX))
> > +   /*
> > +    * When returning from idle, many ticks can get accounted
> > at
> > +    * once, including some ticks of steal, irq, and softirq
> > time.
> > +    * Subtract those ticks from the amount of time accounted
> > to
> > +    * idle, or potentially user or system time. Due to
> > rounding,
> > +    * other time can exceed ticks occasionally.
> > +    */
> > +   other = account_other_ticks(ticks);
> > +   if (other >= ticks)
> >             return;
> > +   ticks -= other;
> >  
> >     cputime *= ticks;
> >     scaled *= ticks;
> So instead of dealing with ticks here, I think you should rather use
> the above
> cputime as both the limit and the remaining time to account after
> steal/irqs.
> 
> This should avoid some middle conversions and improve precision when
> cputime_t == nsecs granularity.
> 
> If we account 2 ticks to idle (lets say HZ=100) and irq time to
> account is 15 ms. 2 ticks = 20 ms
> so we have 5 ms left to account to idle. With the jiffies granularity
> in this patch, we would account
> one tick to irqtime (1 tick = 10 ms, there will be 5 ms to account
> back later) and one tick to idle
> time whereas if you deal with cputime_t, you are going to account the
> correct amount of idle time.
> 

Ahhh, so you want irqtime_account_process_tick to work with
and account fractional ticks when calling account_system_time,
account_user_time, account_idle_time, etc?

I guess that should work fine since we already pass cputime
values in, anyway.

I suppose we can do the same for get_vtime_delta, too.

They can both work with the actual remaining time (in cputime_t),
after the other time has been subtracted.

I can rework the series to do that.

-- 
All Rights Reversed.

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to