Hi I've found that it's pretty easy to put ptp4l/phc2sys BC in sync fault state while port is in transitional UNCALIBRATED state: MASTER -> UNCALIBRATED UNCALIBRATED -> SLAVE in the later case state change sequence is UNCALIBRATED -> SLAVE -> SYNCHRONIZATION_FAULT -> UNCALIBRATED. Once issue happens the BC or GM has to be restarted to recover (not always helps).
To trigger an issue the GM clock has to be restarted few times. The cause of the issue is that both ptp4l and phc2sys could be syncing the same PHC clock while Port is in UNCALIBRATED state. This happens because in reconfigure() the check [1] [1] if (dst_cnt > 1 && !src) { ... if (src_cnt > 1) { ... [2] if (src_cnt > 0 && !src) { ... if (!src_cnt && !dst_cnt) { ... is defined first and selects last clock in the clocks list (usually first defined in ptp4l cfg file) as the master clock even if there is some other clock(Port) is in transitional UNCALIBRATED state. As result, phc2sys will start syncing the same clock as ptp4l, and ptp4l has no chances to stabilize Port in UNCALIBRATED state. Moreover, check [1] bypasses most of follow up checks unless CLOCK_REALTIME is allowed to be master clock. The original commit 46a0b281b915 ("phc2sys: autoconfiguration") gave ptp4l a chance to proceed by introducing check [2]. But follow up commit 2ab2fbbdda86 ("phc2sys: default to the first clock in automatic mode.") introduced check [1] and so this issue. To fix an issue the check [2] is moved down which makes phc2sys to wait for ptp4l to settle. Patch 1 - add debug prints when clock state is changed Patch 2 - fixes an issue BC configuration (jbod): - eth1, phc index 0 - pru10, phc index 1 - pru11, phc index 1 - pru20, phc index 2 <-- GM - eth21, phc index 2 BC started as: sleep ./linuxptp/ptp4l -mf abb_bc.cfg -l5 & sleep 2 ./linuxptp/phc2sys -a -S 0.00002 -L 0 -m -l7 -u$1 & Example log: ptp4l[688.126]: port 4: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED phc2sys[688.294]: port 5cf821.fffe.3b0305-4 changed state phc2sys[688.295]: reconfiguring after port state change phc2sys[688.295]: selecting pru21 for synchronization phc2sys[688.295]: pru20: state change UNCALIBRATED -> SLAVE phc2sys[688.295]: selecting pru11 for synchronization phc2sys[688.295]: skipping pru10: pru11 has the same clock and is already selected phc2sys[688.296]: eth1: state change SLAVE -> MASTER phc2sys[688.296]: selecting eth1 for synchronization phc2sys[688.296]: selecting pru20 as the master clock ^^^^^^ GM detected properly and BC sync switched to pru20 as master phc2sys[688.296]: eth1 rms 21 max 21 freq +22755 +/- 0 delay 26781 +/- 0 phc2sys[688.296]: pru11 rms 134 max 134 freq +22664 +/- 0 delay 5200 +/- 0 phc2sys[689.297]: eth1 rms 554 max 554 freq +23376 +/- 0 delay 26135 +/- 0 ... phc2sys[693.299]: pru11 rms 205 max 205 freq +22725 +/- 0 delay 5135 +/- 0 ptp4l[694.126]: port 4: SLAVE to MASTER on ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES ptp4l[694.126]: selected local clock 5cf821.fffe.3b0305 as best master ptp4l[694.126]: port 1: assuming the grand master role ptp4l[694.126]: port 2: assuming the grand master role ptp4l[694.126]: port 3: assuming the grand master role ptp4l[694.126]: port 4: assuming the grand master role ptp4l[694.126]: port 5: assuming the grand master role phc2sys[694.299]: port 5cf821.fffe.3b0305-4 changed state phc2sys[694.300]: reconfiguring after port state change phc2sys[694.300]: selecting pru21 for synchronization phc2sys[694.300]: pru20: state change SLAVE -> MASTER phc2sys[694.300]: skipping pru20: pru21 has the same clock and is already selected phc2sys[694.300]: selecting pru11 for synchronization phc2sys[694.300]: skipping pru10: pru11 has the same clock and is already selected phc2sys[694.300]: selecting eth1 for synchronization phc2sys[694.300]: no source, selecting eth1 as the default clock ^^^^^^ GM stopped and BC sync switched to eth1 as master phc2sys[694.301]: pru11 rms 582 max 582 freq +23450 +/- 0 delay 26930 +/- 0 phc2sys[694.303]: pru21 rms 686 max 686 freq +23427 +/- 0 delay 26929 +/- 0 ... phc2sys[700.310]: pru11 rms 214 max 214 freq +22603 +/- 0 delay 26844 +/- 0 phc2sys[700.311]: pru21 rms 111 max 111 freq +22310 +/- 0 delay 26485 +/- 0 ptp4l[701.130]: selected best master clock 5051a9.fffe.fc7697 ptp4l[701.130]: port 4: MASTER to UNCALIBRATED on RS_SLAVE phc2sys[701.311]: port 5cf821.fffe.3b0305-4 changed state phc2sys[701.313]: reconfiguring after port state change phc2sys[701.313]: selecting pru21 for synchronization phc2sys[701.313]: pru20: state change MASTER -> UNCALIBRATED phc2sys[701.313]: selecting pru11 for synchronization phc2sys[701.313]: skipping pru10: pru11 has the same clock and is already selected phc2sys[701.313]: eth1: state change SLAVE -> MASTER phc2sys[701.313]: selecting eth1 for synchronization phc2sys[701.313]: no source, selecting eth1 as the default clock ^^^^^^ GM detected pru20=UNCALIBRATED, but BC sync still uses eth1 as master at this moment both ptp4l and phc2sys syncing the same pru20 clock phc2sys[701.314]: pru11 rms 41 max 41 freq +22412 +/- 0 delay 26805 +/- 0 phc2sys[701.314]: pru21 rms 266 max 266 freq +22654 +/- 0 delay 27059 +/- 0 phc2sys[702.314]: pru11 rms 73 max 73 freq +22368 +/- 0 delay 26644 +/- 0 phc2sys[702.314]: pru21 rms 134 max 134 freq +22334 +/- 0 delay 26774 +/- 0 phc2sys[703.315]: pru11 rms 106 max 106 freq +22313 +/- 0 delay 26509 +/- 0 phc2sys[703.315]: pru21 rms 4554 max 4554 freq +17874 +/- 0 delay 26539 +/- 0 phc2sys[704.315]: pru11 rms 34 max 34 freq +22421 +/- 0 delay 26540 +/- 0 phc2sys[704.317]: pru21 rms 0 max -0 freq +21061 +/- 0 delay 26505 +/- 0 phc2sys[705.317]: pru11 rms 4 max 4 freq +22401 +/- 0 delay 26520 +/- 0 phc2sys[705.317]: pru21 rms 1548 max 1548 freq +22609 +/- 0 delay 26885 +/- 0 phc2sys[706.318]: pru11 rms 140 max 140 freq +22539 +/- 0 delay 26729 +/- 0 phc2sys[706.318]: pru21 rms 5790 max 5790 freq +15736 +/- 0 delay 26399 +/- 0 phc2sys[707.318]: pru11 rms 108 max 108 freq +22549 +/- 0 delay 26900 +/- 0 phc2sys[707.319]: pru21 rms 1204 max 1204 freq +20993 +/- 0 delay 27145 +/- 0 ... phc2sys[752.388]: pru11 rms 114 max 114 freq +22360 +/- 0 delay 26744 +/- 0 phc2sys[752.389]: pru21 rms 3866487 max 3866487 freq -972697 +/- 0 delay 26663 +/- 0 phc2sys[753.389]: pru11 rms 81 max 81 freq +22359 +/- 0 delay 26644 +/- 0 phc2sys[753.390]: pru21 rms 6925537 max 6925537 freq -4039400 +/- 0 delay 27141 +/- 0 ^^^^^^ result: pru20 stuck in UNCALIBRATED state and it's clock going out of sync Log after fix: ptp4l[265.606]: port 4: SLAVE to MASTER on ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES ptp4l[265.606]: selected local clock 5cf821.fffe.3b0305 as best master ptp4l[265.606]: port 1: assuming the grand master role ptp4l[265.606]: port 2: assuming the grand master role ptp4l[265.606]: port 3: assuming the grand master role ptp4l[265.606]: port 4: assuming the grand master role ptp4l[265.606]: port 5: assuming the grand master role phc2sys[265.618]: port 5cf821.fffe.3b0305-4 changed state phc2sys[265.618]: reconfiguring after port state change phc2sys[265.618]: selecting pru21 for synchronization phc2sys[265.618]: pru20: state change SLAVE -> MASTER phc2sys[265.618]: skipping pru20: pru21 has the same clock and is already selected phc2sys[265.618]: selecting pru11 for synchronization phc2sys[265.618]: skipping pru10: pru11 has the same clock and is already selected phc2sys[265.618]: selecting eth1 for synchronization phc2sys[265.618]: no source, selecting eth1 as the default clock phc2sys[265.619]: pru11 rms 463 max 463 freq +23348 +/- 0 delay 26844 +/- 0 phc2sys[265.619]: pru21 rms 298 max 298 freq +23268 +/- 0 delay 26420 +/- 0 ... phc2sys[269.622]: pru11 rms 141 max 141 freq +22977 +/- 0 delay 26489 +/- 0 phc2sys[269.623]: pru21 rms 196 max 196 freq +22941 +/- 0 delay 26454 +/- 0 ptp4l[270.610]: selected best master clock 5051a9.fffe.fc7697 ptp4l[270.610]: port 4: MASTER to UNCALIBRATED on RS_SLAVE phc2sys[270.623]: port 5cf821.fffe.3b0305-4 changed state phc2sys[270.624]: reconfiguring after port state change phc2sys[270.624]: selecting pru21 for synchronization phc2sys[270.625]: pru20: state change MASTER -> UNCALIBRATED phc2sys[270.625]: selecting pru11 for synchronization phc2sys[270.625]: skipping pru10: pru11 has the same clock and is already selected phc2sys[270.625]: eth1: state change SLAVE -> MASTER phc2sys[270.625]: selecting eth1 for synchronization phc2sys[270.625]: master clock not ready, waiting... ptp4l[272.609]: port 4: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED phc2sys[272.625]: port 5cf821.fffe.3b0305-4 changed state phc2sys[272.626]: reconfiguring after port state change phc2sys[272.626]: selecting pru21 for synchronization phc2sys[272.627]: pru20: state change UNCALIBRATED -> SLAVE phc2sys[272.627]: selecting pru11 for synchronization phc2sys[272.627]: skipping pru10: pru11 has the same clock and is already selected phc2sys[272.627]: selecting eth1 for synchronization phc2sys[272.627]: selecting pru20 as the master clock phc2sys[272.627]: eth1 rms 175 max 175 freq +23076 +/- 0 delay 26405 +/- 0 phc2sys[272.627]: pru11 rms 64 max 64 freq +23012 +/- 0 delay 5144 +/- 0 phc2sys[273.628]: eth1 rms 428 max 428 freq +22675 +/- 0 delay 26488 +/- 0 phc2sys[273.628]: pru11 rms 232 max 232 freq +22825 +/- 0 delay 5160 +/- 0 Grygorii Strashko (2): phc2sys: add dbg print for clock state change events phc2sys: fix BC sync fault during port master-> uncalibrated transition phc2sys.c | 31 ++++++++++++++++++------------- 1 file changed, 18 insertions(+), 13 deletions(-) -- 2.17.1 _______________________________________________ Linuxptp-devel mailing list Linuxptp-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linuxptp-devel