On Thu, Dec 24, 2020 at 06:36:19PM +0200, Grygorii Strashko wrote:
> The ptp4l/phc2sys BC can stuck in sync fault while port is in transitional
> UNCALIBRATED state:
>  MASTER -> UNCALIBRATED
>  UNCALIBRATED -> SLAVE
> in the later case state change sequence is
>  UNCALIBRATED -> SLAVE -> SYNCHRONIZATION_FAULT -> UNCALIBRATED.
> Once issue happens the BC or GM has to be restarted to recover.
> 
> To trigger an issue the GM clock has to be restarted few times.
> 
> The cause of the issue is that both ptp4l and phc2sys could be syncing the
> same PHC clock while Port is in UNCALIBRATED state. This happens because in
> reconfigure() the check [1]
> 
> [1]   if (dst_cnt > 1 && !src) {
>       ...
>       if (src_cnt > 1) {
>       ...
> [2]   if (src_cnt > 0 && !src) {
>       ...
>       if (!src_cnt && !dst_cnt) {
>       ...
> 
> is defined first and selects last clock in the clocks list (usually first
> defined in ptp4l cfg file) as the master clock even if there is some other
> clock(Port) in transitional UNCALIBRATED state. As result, phc2sys will
> start syncing the same clock as ptp4l, and ptp4l has no chances to
> stabilize Port in UNCALIBRATED state.
> Moreover, check [1] bypasses most of follow up checks unless CLOCK_REALTIME
> is allowed to be master clock.
> 
> The original commit 46a0b281b915 ("phc2sys: autoconfiguration") gave ptp4l
> chance to proceed by introducing check [2]. But follow up commit
> 2ab2fbbdda86 ("phc2sys: default to the first clock in automatic mode.")
> introduced check [1] and so this issue.
> 
> To fix an issue the check [2] is moved down which makes phc2sys to wait for
> ptp4l to settle.
> 
> Signed-off-by: Grygorii Strashko <grygorii.stras...@ti.com>
> ---

I don't think Richard does merge commits, so your logs from the cover
letter are going to get lost, save for the few who will search for the
commit on the mailing archives. The patch is ok (the intention probably
was to select a default source clock only when there were no candidates
at all), but I think the commit message leaves the reader wanting to
know how come the clock with PS_UNCALIBRATED can ever reach inside the
priv->dst_clocks list. The clue is only given by the logs that you
posted in the cover letter, so I would suggest either copying them here
too, or just verbally explaining better.

>  phc2sys.c | 28 +++++++++++++++-------------
>  1 file changed, 15 insertions(+), 13 deletions(-)
> 
> diff --git a/phc2sys.c b/phc2sys.c
> index b15127d..fd80afd 100644
> --- a/phc2sys.c
> +++ b/phc2sys.c
> @@ -419,19 +419,6 @@ static void reconfigure(struct phc2sys_private *priv)
>               }
>               last = c;
>       }
> -     if (dst_cnt > 1 && !src) {
> -             if (!rt || rt->dest_only) {
> -                     priv->master = last;
> -                     /* Reset to original state in next reconfiguration. */
> -                     priv->master->new_state = priv->master->state;
> -                     priv->master->state = PS_SLAVE;
> -                     if (rt)
> -                             rt->state = PS_SLAVE;
> -                     pr_info("no source, selecting %s as the default clock",
> -                             last->device);
> -                     return;
> -             }
> -     }
>       if (src_cnt > 1) {
>               pr_info("multiple master clocks available, postponing sync...");
>               priv->master = NULL;
> @@ -447,6 +434,21 @@ static void reconfigure(struct phc2sys_private *priv)
>               priv->master = NULL;
>               return;
>       }
> +
> +     if (dst_cnt > 1 && !src) {
> +             if (!rt || rt->dest_only) {
> +                     priv->master = last;
> +                     /* Reset to original state in next reconfiguration. */
> +                     priv->master->new_state = priv->master->state;
> +                     priv->master->state = PS_SLAVE;
> +                     if (rt)
> +                             rt->state = PS_SLAVE;
> +                     pr_info("no source, selecting %s as the default clock",
> +                             last->device);
> +                     return;
> +             }
> +     }
> +
>       if ((!src_cnt && (!rt || rt->dest_only)) ||
>           (!dst_cnt && !rt)) {
>               pr_info("nothing to synchronize");
> -- 
> 2.17.1
> 


_______________________________________________
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel

Reply via email to