On Tue, Feb 03, 2015 at 01:33:15PM -0800, Peter Wemm wrote:
> Sometime in the Dec 10th through Jan 7th timeframe a timing bug has been 
> introduced to 11.x/head/-current.    With HZ=1000 (the default for bare 
> metal, 
> not for a vm); the clocks stop just after 24 days of uptime.  This means 
> things like cron, sleep, timeouts etc stop working.  TCP/IP won't time out or 
> retransmit, etc etc.  It can get ugly.
> The problem is NOT in 10.x/-stable.
> We hit this in the freebsd.org cluster, the builds that we used are:
> FreeBSD 11.0-CURRENT #0 r275684: Wed Dec 10 20:38:43 UTC 2014 - fine
> FreeBSD 11.0-CURRENT #0 r276779: Wed Jan  7 18:47:09 UTC 2015 - broken
> If you are running -current in a situation where it'll accumulate uptime, you 
> may want to take precautions.  A reboot prior to 24 days uptime (as horrible 
> a 
> workaround as that is) will avoid it.
> Yes, this is being worked on.

So the issue is reproducable in 3 minutes after boot with the following
change in kern_clock.c:
volatile int    ticks = INT_MAX - (/*hz*/1000 * 3 * 60);

It is fixed (in the proper meaning of the word, not like worked around,
covered by paper) by the patch at the end of the mail.

We already have a story trying to enable much less ambitious option
-fno-strict-overflow, see r259045 and the revert in r259422.  I do not
see other way than try one more time.  Too many places in kernel
depend on the correctly wrapping 2-complement arithmetic, among others
are callweel and scheduler.

diff --git a/sys/conf/kern.mk b/sys/conf/kern.mk
index c031b3a..eb7ce2f 100644
--- a/sys/conf/kern.mk
+++ b/sys/conf/kern.mk
@@ -158,6 +158,11 @@ INLINE_LIMIT?=     8000
 CFLAGS+=       -ffreestanding
+# Make signed arithmetic wrap.
+CFLAGS+=       -fwrapv
 # GCC SSP support
 .if ${MK_SSP} != "no" && \
freebsd-current@freebsd.org mailing list
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to