> Date: Thu, 27 Jul 2023 15:05:23 +1000 > from: matthew green <m...@eterna.com.au> > > one problem i've seen in kern_tc.c when the timecounter returns > a smaller value is that tc_delta() ends up returning a very large > (underflowed) value, and that makes the consumers of it do a very > wrong thing. eg, -2 becomes 2^32-2, and then eg in binuptime: > > 477 bintime_addx(bt, th->th_scale * tc_delta(th)); > > or in tc_windup(): > > 933 delta = tc_delta(th); > 938 th->th_offset_count += delta; > 939 bintime_addx(&th->th_offset, th->th_scale * delta); > > i "fixed" the time goes backwards on sparc issue a few years ago > with this change, which avoids the above issue: > > http://mail-index.netbsd.org/source-changes/2018/01/12/msg091064.html > > but i really think that the way tc_delta() can underflow is a > bad problem we should fix properly, i just wasn't sure of the > right way to do it.
I don't understand, what do you mean by underflow here? Part of the API contract of a k-bit timecounter(9) is that the underlying clock must not have a frequency higher than f * 2^k / 2, where f is the frequency of tc_windup calls.[*] For example, if f = 100 Hz (i.e., hz=100), and k = 32 (as we use), then the maximum timecounter frequency is 100 Hz * 2^32 / 2 ~= 214 GHz. Even if f = 10 Hz, this is 21.4 GHz. Under this premise, in the duration between two tc_windup calls, consecutive calls to get_timecount() mod 2^k can't differ by more than 2^k / 2. And each call to tc_windup resets th->th_offset_count := get_timecount(). So no matter how many times you call tc_delta(th) within that time, (get_timecount() - th->th_offset_count) mod 2^k can't wrap around, i.e., a sequence of calls must yield a nondecreasing sequence of k-bit integers. I don't know what the sparc timecounter frequency is, but the Xen system timecounter returns units of nanoseconds, i.e., runs at 1 GHz, well within these bounds. So this kind of wraparound leading to apparently negative runtime -- that is, l->l_stime going backwards -- should not be possible as long as we are calling tc_windup() at a frequency of at least 1 GHz / (2^k / 2) = 0.47 Hz. That said, at a 32-bit timecounter frequency of 1 GHz, if there is a period of about 2^32 / 1 GHz ~= 4.3sec during which we miss all consecutive hardclock ticks, that would violate the timecounter(9) assumptions, and tc_delta(th) may go backwards if that happens. So I think we need to find out why we're missing Xen hardclock timer interrupts. Should also make the dtrace probe show exactly how many hardclock ticks in a batch happened, and should raise an alarm (with or without dtrace) if it exceeds a threshold. [*] Actually the limit is closer to f * 2^k, not f * 2^k / 2, but there probably has to be a little slop for the computational overhead of tc_windup to ensure the timehands are updated before tc_delta would wrap around; a factor of two gives a comfortable margin of error here.