I am busy implementing some heartbeat monitoring code between two
machines. The spec calls for a 1 second recovery.

Basically if I get no heartbeats for 1 full second then I should
consider the peer system to have failed.

To cope with the leap-second scenario, one solution is to use a
timeout of 1 second longer than usual if the current time is close
to the turnover of the day. You can do this easily by checking

   time(NULL) % 86400

and if we are at the turnover of the day use a 2 second timeout
instead of a 1 second timeout.

Now this seems like a nice and easy way of fixing old code. Here
is an example:


void process_event(Event e)
{
   long long now = gettimeofday_in_millisecs();
   if (now > last_recv_time + 1000) {
      peer_has_failed();
   } else if (e == EVENT_HEARTBEAT) {
      last_recv_time = now;
   }
}


Becomes:


int near_turnover_of_day(long long t)
{
#define FUDGE 2
   if ((t + FUDGE) % 86400 <= FUDGE * 2)
      return 1;
   return 0;
}

void process_heartbeat(Event e)
{
   long long now = gettimeofday_in_millisecs();
   if (now > last_recv_time + 1000 + 
             1000 * near_turnover_of_day(now / 1000)) {
      peer_has_failed();
   } else if (e == EVENT_HEARTBEAT) {
      last_recv_time = now;
   }
}


Comments?

(This example is off the top of my head so please excuse any errors.)

-paul




 


_______________________________________________
LEAPSECS mailing list
[email protected]
http://six.pairlist.net/mailman/listinfo/leapsecs

Reply via email to