Yo,

Something I've been wondering about for a while is a class of bug that we
have with clock skew.

Memcached handles the expiration timer via:

- "process_started" timestamp in seconds, which gets initialized at
startup
- "current_time" which, once per second, gets set to the delta between
the current time and "process_started"

If your clock swings around wildly there're a few situations where you
could potentially end up with items expiring immediately or never, such as
current_time ending up underflowing.

A couple easy ideas off the top of my head that would drop some accuracy
for avoiding timers (and any cross-platform timer idiocy):

- Ditch "proess_started" and kick a counter at 0. Every second the
current_time would be incremented by 1. A relative timeout of "60 seconds
from now" would be set to "current_time + 60" as it presently is. We'd
have to do something special for date formatted expirations. Potentially
by noting the exact time once on startup and using that to delta against a
provided date to provide the delta-in-seconds. The latter can still be
influenced by bad clock, but maybe not as noticable and the feature is
less used.

- Add some sanity checks in the clock update function, which will fall
back to incrementing by 1 if it detects a significant clock correction
forward, or if it's gone back in time. Still uses gettimeofday() unless
something goes wrong, keeps plodding forward less accurately when
something does go wrong.

- Use some anti-clock-skew magic that maybe libevent uses. Need to
research more options :P

Anyone care? The increase in the number of these types of reports is
getting obnoxious, and cloud computing's god-awful-ness can only make it
worse.

-Dormando

Reply via email to