On Tue, Aug 02, 2016 at 10:08:04PM +0000, Keller, Jacob E wrote:
> Here? Wouldn't these two port dispatch things be racing with the ones
> above?

I don't think the link events will spoil the existing timeouts.  Here
are the currently defined fault types.

  FT_UNSPECIFIED - We use this for any kind of socket IO error and
  also in case of low memory.  It very non-specific.  The most likely
  cause of socket IO error is a downed link.  This fault also covers
  missing time stamps from flakey or limited HW or drivers.
  Regardless of the specific cause, we simply wait for a defined power
  of 2 number of seconds and hope the problem goes away.

  FT_BAD_PEER_NETWORK - This was added to support gPTP which requires
  a linear number of seconds (not 2^N) for this specific fault.

  FT_SWITCH_PHC - This is used to distinguish a possible fault when
  switching clocks in the "jbod" mode.  The value is hard coded at 16
  seconds, without any config option at all.  The purpose of this
  fault is really debugging.  In case the jbod mode every explodes in
  the wild, we will be able to see the cause.

Here is the new state machine (ascii art!):

          +--------+    Fault    +---------+
          |        |------------>|         |
          |   UP   |             |  FAULT  |
          |        |<------------|         |
          +--+--+--+   Timeout   +---------+
             A  |                   /
             |  |                  /
   Link-Up   |  | Link-Down       /
             |  |                /
             |  V               /
          +--+--+--+           /  Link-Down
          |        |          /
          |  DOWN  |<--------/
          |        |
          +--------+

If the fault timer occurs in the DOWN state, we simply ignore it.
After all, without the link the port is useless.

There is one case where the new code changes the existing behavior.
If the link quickly does down and then up again while another fault
(and its timer) are active, then we will enter the UP state without
waiting for the fault timer expiration.  However, I consider this
behavior acceptable, because when a link goes up, you are starting
from with a clean slate.  Even the network might have changed, so the
FT_BAD_PEER_NETWORK fault is then dubious.

I would welcome even more comprehensive fault handling, if some one
wants to code it.  But having this more intelligent link handling is
an improvement and at least a step in the right direction, IMHO.

Thoughts?

Thanks,
Richard


------------------------------------------------------------------------------
_______________________________________________
Linuxptp-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel

Reply via email to