From: Vincent Cheng <vincent.cheng...@renesas.com> Problem ======= Unselected master port may unintentionally continue to request announce/sync/delay_resp packets. Expecting port with unselected master to only maintain announce messages.
Setup ===== Client running with unicast mode and 2 ports, each port has single entry unicast_master_table. ======================================= | Client | | clockClass 248 | | dataset_comparison G.8275.x | | clock_type BC | | unicast_req_duration 10 | | | | enp0s8 enp0s9 | | port 1 port 2 | ======================================= | | | Server B | 080027.fffe.fc19f1, clockClass 7 | Server A 080027.fffe.fd0000, clockClass 7 unicast_req_duration is shortened so don't have to wait as long to see effect. Server A & B must have same timing properties to cause port state PASSIVE. ie. telecom_dscmp() -> dscmp2() to compare the port identity to determine best master. If we don't enter PASSIVE state, it is harder to see the problem. If we use clockClass to distinguish the masters, the problem is still there, but quickly fixed in subsequent PRE-MASTER -> MASTER state decision event. Client ------ clockClass 248 dataset_comparison G.8275.x clock_type BC unicast_req_duration 10 Test ==== 1. Start Client and Server A. 2. Port 1:Server A is best master. port 1: MASTER to UNCALIBRATED port 2: MASTER 3. Start Server B. 4. Port 2:Server B is best master. port 1: UNCALIBRATED to PASSIVE port 2: MASTER to UNCALIBRATED 5. Port 1 continues to request annc/sync/delay_resp on next renewal Failure log (with some debug annotations) =========== # Start Client and Server A ptp4l[82910.789]: port 1 (enp0s8): unicast ANNOUNCE granted for 10 sec ptp4l[82910.800]: port 1 (enp0s8): new foreign master 080027.fffe.fd0000-1 ... # Server A is best master ptp4l[82915.056]: selected best master clock 080027.fffe.fd0000 ptp4l[82915.056]: port 1 (enp0s8): unicast_fsm: UC_HAVE_ANN -> UC_NEED_SYDY on UC_EV_SELECTED ... ptp4l[82922.789]: port 1 (enp0s8): time to renew unicast subscriptions (UC_HAVE_SYDY) ptp4l[82922.790]: port 1 (enp0s8): unicast ANNOUNCE granted for 10 sec ptp4l[82922.790]: port 1 (enp0s8): unicast SYNC granted for 10 sec ptp4l[82922.790]: port 1 (enp0s8): unicast DELAY_RESP granted for 10 sec ... ptp4l[82938.792]: port 1 (enp0s8): time to renew unicast subscriptions (UC_HAVE_SYDY) ptp4l[82938.793]: port 1 (enp0s8): unicast ANNOUNCE granted for 10 sec ptp4l[82938.793]: port 1 (enp0s8): unicast SYNC granted for 10 sec ptp4l[82938.793]: port 1 (enp0s8): unicast DELAY_RESP granted for 10 sec # Start Server B ptp4l[82939.057]: port 2 (enp0s9): new foreign master 080027.fffe.fc19f1-1 # Port 2:Server B is best master ptp4l[82943.311]: selected best master clock 080027.fffe.fc19f1 ptp4l[82943.311]: port 1 (enp0s8): unicast_fsm: UC_HAVE_SYDY -> UC_HAVE_SYDY on UC_EV_SELECTED ptp4l[82943.311]: port 1 (enp0s8): UNCALIBRATED to PASSIVE on RS_PASSIVE ptp4l[82943.311]: port 2 (enp0s9): unicast_fsm: UC_HAVE_ANN -> UC_NEED_SYDY on UC_EV_SELECTED ptp4l[82943.311]: port 2 (enp0s9): MASTER to UNCALIBRATED on RS_SLAVE ... # Port 1 continues to request annc/sync/delay_resp on subsequent unicast renewal ptp4l[82946.793]: port 1 (enp0s8): time to renew unicast subscriptions (UC_HAVE_SYDY) ptp4l[82946.794]: port 1 (enp0s8): unicast ANNOUNCE granted for 10 sec ptp4l[82946.795]: port 1 (enp0s8): unicast SYNC granted for 10 sec ptp4l[82946.795]: port 1 (enp0s8): unicast DELAY_RESP granted for 10 sec ... ptp4l[82950.794]: port 2 (enp0s9): time to renew unicast subscriptions (UC_HAVE_SYDY) ptp4l[82950.795]: port 2 (enp0s9): unicast ANNOUNCE granted for 10 sec ptp4l[82950.795]: port 2 (enp0s9): unicast SYNC granted for 10 sec ptp4l[82950.795]: port 2 (enp0s9): unicast DELAY_RESP granted for 10 sec ... ptp4l[82954.795]: port 1 (enp0s8): time to renew unicast subscriptions (UC_HAVE_SYDY) ptp4l[82954.797]: port 1 (enp0s8): unicast ANNOUNCE granted for 10 sec ptp4l[82954.797]: port 1 (enp0s8): unicast SYNC granted for 10 sec ptp4l[82954.797]: port 1 (enp0s8): unicast DELAY_RESP granted for 10 sec ... ptp4l[82958.796]: port 2 (enp0s9): time to renew unicast subscriptions (UC_HAVE_SYDY) ptp4l[82958.797]: port 2 (enp0s9): unicast ANNOUNCE granted for 10 sec ptp4l[82958.797]: port 2 (enp0s9): unicast SYNC granted for 10 sec ptp4l[82958.797]: port 2 (enp0s9): unicast DELAY_RESP granted for 10 sec Cause ===== handle_state_decision_event(): c->best = best; // *** parent pid may have changed c->best_id = best_id; LIST_FOREACH(piter, &c->ports, list) { enum port_state ps; enum fsm_event event; ps = bmc_state_decision(c, piter, c->dscmp); switch (ps) { case PS_LISTENING: event = EV_NONE; break; case PS_GRAND_MASTER: pr_notice("%s: assuming the grand master role", port_log_name(piter)); clock_update_grandmaster(c); // *** Updates parent pid event = EV_RS_GRAND_MASTER; break; case PS_MASTER: event = EV_RS_MASTER; break; case PS_PASSIVE: event = EV_RS_PASSIVE; break; case PS_SLAVE: clock_update_slave(c); // *** Updates parent pid event = EV_RS_SLAVE; break; default: event = EV_FAULT_DETECTED; break; } port_dispatch(piter, event, fresh_best); // *** Uses parent pid } The update of the clock's parent pid to reflect best master change is non-deterministic. It depends on the port processing order and bmc_state_decision() results. However, port_dispatch() references clock's parent pid in unicast_client_state_changed(). port_dispatch() -> bc_dispatch() -> port_state_update() -> unicast_client_state_changed() void unicast_client_state_changed(struct port *p) { ... pid = clock_parent_identity(p->clock); ... STAILQ_FOREACH(ucma, &p->unicast_master_table->addrs, list) { if (pid_eq(&ucma->portIdentity, &pid)) { ucma->state = unicast_fsm(ucma->state, UC_EV_SELECTED); } else { ucma->state = unicast_fsm(ucma->state, UC_EV_UNSELECTED); } } The clock's parent pid may be stale when used in unicast_client_state_changed() and put the UC FSM in wrong state. In above failure scenario, Port 1's wrong UC FSM state causes the unexpected renewal of annc/sync/delay_resp. Solution ======== Update the clock's parent identity prior to port iteration. c->best = best; c->best_id = best_id; +clock_update_parent_identity(c); After fix --------- # Start Client and Server A ptp4l[93271.613]: port 1 (enp0s8): unicast ANNOUNCE granted for 10 sec ptp4l[93275.878]: port 1 (enp0s8): new foreign master 080027.fffe.fd0000-1 # Port 1:Server A is best master ptp4l[93280.133]: selected best master clock 080027.fffe.fd0000 ptp4l[93280.133]: port 1 (enp0s8): unicast_fsm: UC_NEED_SYDY -> UC_NEED_SYDY on UC_EV_SELECTED ptp4l[93283.616]: port 1 (enp0s8): unicast SYNC granted for 10 sec ptp4l[93283.616]: port 1 (enp0s8): unicast DELAY_RESP granted for 10 sec ... ptp4l[93287.615]: port 1 (enp0s8): time to renew unicast subscriptions (UC_HAVE_SYDY) ptp4l[93287.616]: port 1 (enp0s8): unicast ANNOUNCE granted for 10 sec ptp4l[93287.616]: port 1 (enp0s8): unicast SYNC granted for 10 sec ptp4l[93287.616]: port 1 (enp0s8): unicast DELAY_RESP granted for 10 sec # Start Server B ptp4l[93302.011]: port 2 (enp0s9): new foreign master 080027.fffe.fc19f1-1 # Port 2:Server B is best master ptp4l[93306.264]: selected best master clock 080027.fffe.fc19f1 # # port 1's UC FSM is in correct state, previously stayed in UC_HAVE_SYDY state # ptp4l[93306.264]: port 1 (enp0s8): unicast_fsm: UC_HAVE_SYDY -> UC_HAVE_ANN on UC_EV_UNSELECTED ptp4l[93306.264]: port 1 (enp0s8): UNCALIBRATED to PASSIVE on RS_PASSIVE ptp4l[93306.264]: port 2 (enp0s9): unicast_fsm: UC_HAVE_ANN -> UC_NEED_SYDY on UC_EV_SELECTED ptp4l[93306.264]: port 2 (enp0s9): MASTER to UNCALIBRATED on RS_SLAVE # Port 1 ** renews annc only ** ptp4l[93311.621]: port 1 (enp0s8): time to renew unicast subscriptions (UC_HAVE_ANN) ptp4l[93311.622]: port 1 (enp0s8): unicast ANNOUNCE granted for 10 sec ... ptp4l[93311.621]: port 2 (enp0s9): time to renew unicast subscriptions (UC_HAVE_SYDY) ptp4l[93311.622]: port 2 (enp0s9): unicast ANNOUNCE granted for 10 sec ptp4l[93311.622]: port 2 (enp0s9): unicast SYNC granted for 10 sec ptp4l[93311.622]: port 2 (enp0s9): unicast DELAY_RESP granted for 10 sec ... ptp4l[93319.622]: port 1 (enp0s8): time to renew unicast subscriptions (UC_HAVE_ANN) ptp4l[93319.623]: port 1 (enp0s8): unicast ANNOUNCE granted for 10 sec ... ptp4l[93319.622]: port 2 (enp0s9): time to renew unicast subscriptions (UC_HAVE_SYDY) ptp4l[93319.623]: port 2 (enp0s9): unicast ANNOUNCE granted for 10 sec ptp4l[93319.623]: port 2 (enp0s9): unicast SYNC granted for 10 sec ptp4l[93319.623]: port 2 (enp0s9): unicast DELAY_RESP granted for 10 sec Patch based on main branch, commit 3c2840f2. Vincent Cheng (1): clock: Fix stale clock parent pid usage after best master change clock.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) -- 2.34.1 _______________________________________________ Linuxptp-devel mailing list Linuxptp-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linuxptp-devel