From: Vincent Cheng <vincent.cheng...@renesas.com>

Problem
=======
Unselected master port may unintentionally continue to request 
announce/sync/delay_resp packets. 
Expecting port with unselected master to only maintain announce messages.


Setup
=====
Client running with unicast mode and 2 ports, each port has single entry 
unicast_master_table.

=======================================
| Client                              |
|   clockClass              248       |
|   dataset_comparison      G.8275.x  |
|   clock_type              BC        |
|   unicast_req_duration    10        |
|                                     |
| enp0s8                       enp0s9 |
| port 1                       port 2 |
=======================================
  |                              |
  |                              Server B
  |                              080027.fffe.fc19f1, clockClass 7
  |
  Server A
  080027.fffe.fd0000, clockClass 7

unicast_req_duration is shortened so don't have to wait as long to see effect.

Server A & B must have same timing properties to cause port state PASSIVE.
ie. telecom_dscmp() -> dscmp2() to compare the port identity to determine best 
master.

If we don't enter PASSIVE state, it is harder to see the problem.
If we use clockClass to distinguish the masters, the problem is still there, 
but quickly fixed in subsequent PRE-MASTER -> MASTER state decision event.


Client
------
clockClass              248
dataset_comparison      G.8275.x
clock_type              BC
unicast_req_duration    10


Test
====
1. Start Client and Server A.
2. Port 1:Server A is best master.
           port 1: MASTER to UNCALIBRATED
           port 2: MASTER
3. Start Server B.
4. Port 2:Server B is best master.
           port 1: UNCALIBRATED to PASSIVE
           port 2: MASTER to UNCALIBRATED
5. Port 1 continues to request annc/sync/delay_resp on next renewal


Failure log (with some debug annotations)
===========

# Start Client and Server A
ptp4l[82910.789]: port 1 (enp0s8): unicast ANNOUNCE granted for 10 sec
ptp4l[82910.800]: port 1 (enp0s8): new foreign master 080027.fffe.fd0000-1
...

# Server A is best master
ptp4l[82915.056]: selected best master clock 080027.fffe.fd0000

ptp4l[82915.056]: port 1 (enp0s8): unicast_fsm: UC_HAVE_ANN -> UC_NEED_SYDY on 
UC_EV_SELECTED
...
ptp4l[82922.789]: port 1 (enp0s8): time to renew unicast subscriptions 
(UC_HAVE_SYDY)
ptp4l[82922.790]: port 1 (enp0s8): unicast ANNOUNCE granted for 10 sec
ptp4l[82922.790]: port 1 (enp0s8): unicast SYNC granted for 10 sec
ptp4l[82922.790]: port 1 (enp0s8): unicast DELAY_RESP granted for 10 sec
...
ptp4l[82938.792]: port 1 (enp0s8): time to renew unicast subscriptions 
(UC_HAVE_SYDY)
ptp4l[82938.793]: port 1 (enp0s8): unicast ANNOUNCE granted for 10 sec
ptp4l[82938.793]: port 1 (enp0s8): unicast SYNC granted for 10 sec
ptp4l[82938.793]: port 1 (enp0s8): unicast DELAY_RESP granted for 10 sec

# Start Server B
ptp4l[82939.057]: port 2 (enp0s9): new foreign master 080027.fffe.fc19f1-1

# Port 2:Server B is best master
ptp4l[82943.311]: selected best master clock 080027.fffe.fc19f1

ptp4l[82943.311]: port 1 (enp0s8): unicast_fsm: UC_HAVE_SYDY -> UC_HAVE_SYDY on 
UC_EV_SELECTED
ptp4l[82943.311]: port 1 (enp0s8): UNCALIBRATED to PASSIVE on RS_PASSIVE

ptp4l[82943.311]: port 2 (enp0s9): unicast_fsm: UC_HAVE_ANN -> UC_NEED_SYDY on 
UC_EV_SELECTED
ptp4l[82943.311]: port 2 (enp0s9): MASTER to UNCALIBRATED on RS_SLAVE
...

# Port 1 continues to request annc/sync/delay_resp on subsequent unicast renewal
ptp4l[82946.793]: port 1 (enp0s8): time to renew unicast subscriptions 
(UC_HAVE_SYDY)
ptp4l[82946.794]: port 1 (enp0s8): unicast ANNOUNCE granted for 10 sec
ptp4l[82946.795]: port 1 (enp0s8): unicast SYNC granted for 10 sec
ptp4l[82946.795]: port 1 (enp0s8): unicast DELAY_RESP granted for 10 sec
...
ptp4l[82950.794]: port 2 (enp0s9): time to renew unicast subscriptions 
(UC_HAVE_SYDY)
ptp4l[82950.795]: port 2 (enp0s9): unicast ANNOUNCE granted for 10 sec
ptp4l[82950.795]: port 2 (enp0s9): unicast SYNC granted for 10 sec
ptp4l[82950.795]: port 2 (enp0s9): unicast DELAY_RESP granted for 10 sec
...
ptp4l[82954.795]: port 1 (enp0s8): time to renew unicast subscriptions 
(UC_HAVE_SYDY)
ptp4l[82954.797]: port 1 (enp0s8): unicast ANNOUNCE granted for 10 sec
ptp4l[82954.797]: port 1 (enp0s8): unicast SYNC granted for 10 sec
ptp4l[82954.797]: port 1 (enp0s8): unicast DELAY_RESP granted for 10 sec
...
ptp4l[82958.796]: port 2 (enp0s9): time to renew unicast subscriptions 
(UC_HAVE_SYDY)
ptp4l[82958.797]: port 2 (enp0s9): unicast ANNOUNCE granted for 10 sec
ptp4l[82958.797]: port 2 (enp0s9): unicast SYNC granted for 10 sec
ptp4l[82958.797]: port 2 (enp0s9): unicast DELAY_RESP granted for 10 sec


Cause
=====

  handle_state_decision_event():

        c->best = best;                                         // *** parent 
pid may have changed
        c->best_id = best_id;

        LIST_FOREACH(piter, &c->ports, list) {
                enum port_state ps;
                enum fsm_event event;
                ps = bmc_state_decision(c, piter, c->dscmp);
                switch (ps) {
                case PS_LISTENING:
                        event = EV_NONE;
                        break;
                case PS_GRAND_MASTER:
                        pr_notice("%s: assuming the grand master role",
                                  port_log_name(piter));
                        clock_update_grandmaster(c);            // *** Updates 
parent pid
                        event = EV_RS_GRAND_MASTER;
                        break;
                case PS_MASTER:
                        event = EV_RS_MASTER;
                        break;
                case PS_PASSIVE:
                        event = EV_RS_PASSIVE;
                        break;
                case PS_SLAVE:
                        clock_update_slave(c);                  // *** Updates 
parent pid
                        event = EV_RS_SLAVE;
                        break;
                default:
                        event = EV_FAULT_DETECTED;
                        break;
                }
                port_dispatch(piter, event, fresh_best);        // *** Uses 
parent pid
        }

The update of the clock's parent pid to reflect best master change is 
non-deterministic.
It depends on the port processing order and bmc_state_decision() results.

However, port_dispatch() references clock's parent pid in 
unicast_client_state_changed().

  port_dispatch()
        -> bc_dispatch()
                -> port_state_update()
                        -> unicast_client_state_changed()

  void unicast_client_state_changed(struct port *p)
  {
        ...
        pid = clock_parent_identity(p->clock);
        ...
        STAILQ_FOREACH(ucma, &p->unicast_master_table->addrs, list) {
                if (pid_eq(&ucma->portIdentity, &pid)) {
                        ucma->state = unicast_fsm(ucma->state, UC_EV_SELECTED);
                } else {
                        ucma->state = unicast_fsm(ucma->state, 
UC_EV_UNSELECTED);
                }
        }

The clock's parent pid may be stale when used in unicast_client_state_changed() 
and put the UC FSM in wrong state.

In above failure scenario, Port 1's wrong UC FSM state causes the unexpected 
renewal of annc/sync/delay_resp.


Solution
========
  Update the clock's parent identity prior to port iteration.

         c->best = best;
         c->best_id = best_id;

        +clock_update_parent_identity(c);


After fix
---------
# Start Client and Server A
ptp4l[93271.613]: port 1 (enp0s8): unicast ANNOUNCE granted for 10 sec
ptp4l[93275.878]: port 1 (enp0s8): new foreign master 080027.fffe.fd0000-1

# Port 1:Server A is best master
ptp4l[93280.133]: selected best master clock 080027.fffe.fd0000

ptp4l[93280.133]: port 1 (enp0s8): unicast_fsm: UC_NEED_SYDY -> UC_NEED_SYDY on 
UC_EV_SELECTED
ptp4l[93283.616]: port 1 (enp0s8): unicast SYNC granted for 10 sec
ptp4l[93283.616]: port 1 (enp0s8): unicast DELAY_RESP granted for 10 sec
...
ptp4l[93287.615]: port 1 (enp0s8): time to renew unicast subscriptions 
(UC_HAVE_SYDY)
ptp4l[93287.616]: port 1 (enp0s8): unicast ANNOUNCE granted for 10 sec
ptp4l[93287.616]: port 1 (enp0s8): unicast SYNC granted for 10 sec
ptp4l[93287.616]: port 1 (enp0s8): unicast DELAY_RESP granted for 10 sec

# Start Server B
ptp4l[93302.011]: port 2 (enp0s9): new foreign master 080027.fffe.fc19f1-1

# Port 2:Server B is best master
ptp4l[93306.264]: selected best master clock 080027.fffe.fc19f1

#
# port 1's UC FSM is in correct state, previously stayed in UC_HAVE_SYDY state
#
ptp4l[93306.264]: port 1 (enp0s8): unicast_fsm: UC_HAVE_SYDY -> UC_HAVE_ANN on 
UC_EV_UNSELECTED
ptp4l[93306.264]: port 1 (enp0s8): UNCALIBRATED to PASSIVE on RS_PASSIVE

ptp4l[93306.264]: port 2 (enp0s9): unicast_fsm: UC_HAVE_ANN -> UC_NEED_SYDY on 
UC_EV_SELECTED
ptp4l[93306.264]: port 2 (enp0s9): MASTER to UNCALIBRATED on RS_SLAVE

# Port 1 ** renews annc only **
ptp4l[93311.621]: port 1 (enp0s8): time to renew unicast subscriptions 
(UC_HAVE_ANN)
ptp4l[93311.622]: port 1 (enp0s8): unicast ANNOUNCE granted for 10 sec
...
ptp4l[93311.621]: port 2 (enp0s9): time to renew unicast subscriptions 
(UC_HAVE_SYDY)
ptp4l[93311.622]: port 2 (enp0s9): unicast ANNOUNCE granted for 10 sec
ptp4l[93311.622]: port 2 (enp0s9): unicast SYNC granted for 10 sec
ptp4l[93311.622]: port 2 (enp0s9): unicast DELAY_RESP granted for 10 sec
...
ptp4l[93319.622]: port 1 (enp0s8): time to renew unicast subscriptions 
(UC_HAVE_ANN)
ptp4l[93319.623]: port 1 (enp0s8): unicast ANNOUNCE granted for 10 sec
...
ptp4l[93319.622]: port 2 (enp0s9): time to renew unicast subscriptions 
(UC_HAVE_SYDY)
ptp4l[93319.623]: port 2 (enp0s9): unicast ANNOUNCE granted for 10 sec
ptp4l[93319.623]: port 2 (enp0s9): unicast SYNC granted for 10 sec
ptp4l[93319.623]: port 2 (enp0s9): unicast DELAY_RESP granted for 10 sec


v2:
Suggested by: Richard Cochran <richardcoch...@gmail.com>
- v1 fix may update parentPortIdentity incorrectly in some cases
  Instead, set a dirty flag and move call of unicast_client_state_change()
  to the end of handle_state_decision_event()

Suggested by: Erez Geva <erez.geva....@siemens.com>
- Add comment to explain setting portNumber = 0

Vincent Cheng (1):
  clock: Fix stale clock parent pid usage after best master change

 clock.c        |  6 ++++++
 port.c         | 12 ++++++++++--
 port.h         |  7 +++++++
 port_private.h |  1 +
 4 files changed, 24 insertions(+), 2 deletions(-)

-- 
2.34.1



_______________________________________________
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel

Reply via email to