Re: [Etherlab-users] DC synchronization demo about etherlab master

Graeme Foot Mon, 05 May 2025 16:16:03 -0700

Hi Circle,

Re #1)


  *   ecrt_master_application_time() stores the PC time in the master.
  *   The ref slave clock is set to the masters time (plus its transmission 
delay) on activation.
  *   ecrt_master_sync_slave_clocks() syncs subsequent slaves clocks to the ref 
slave, and returns the ref slave time.
  *   ecrt_master_reference_clock_time() gets the ref slaves time “slaveTime” 
(minus its transmission delay) from the previous 
ecrt_master_sync_slave_clocks()  call, so returns the slave time, at the time 
of the send, of the previous send
  *   “ecMaster->m_dcTime” caches the time sent to 
ecrt_master_application_time() of the previous send

So we can compare the difference between “(uint32_t)ecMaster->m_dcTime” and 
“slaveTime” and if there’s no drift or jitter between the master and slave 
clocks we should get a value of zero.  The rest of the code is attempting to 
filter out the jitter and calculate a drift compensation.  Note: we are 
comparing the lower 32 bits of the times.

“why are we using pc time - reference time to caculate m_dcDiff (m_dcTime 
-slaveTime, even it's named by m_dcTime, i think it's still a term of pc 
time)?”:
if there’s no drift and the ref slave clock has been set to the master clock on 
activation then the master and slave time should match.  If there’s drift, we 
need to adjust the master time to compensate.  (So it is comparing the PC clock 
time to the slave time to figure out the drift.)

“However time of SYNC0(0x990) is changing quite regularly”:
if dc is enabled on the slave it is incremented by the cycle period every cycle 
(by the slave).  If it’s a 32bit dc clock the time value rolls over every 
4.2second odd, if it’s a 64bit dc clock you see the whole time value.  For DC 
to remain synced (and enable dc sync0) it only needs a 32bit dc clock, but if 
you want proper timestamping of events, you need the 64bit clocks.

“0x990 is always like xxxxxxxxxx500000 (500000 is sync0_shift), why don't we 
try to make slaveTime to, like 0-phase-aligned”:
Once dc is set up sync0 0x990 on the slave just keeps ticking over based on the 
slaves clock.  The slaves clock is synced to the ref slaves clock.  Your 
application can choose when sync0 occurs on the slave with respect to the cycle 
but has no other control over it.  However your application must call 
ecrt_master_send() once every cycle so that the frames reach the slave before 
the cycles sync0 time is triggered.  Other than that you can choose when to 
call ecrt_master_send().  In your realtime cycle you can choose when to wake up 
and perform your calculations.  That wakeup event it triggered in relation to 
the PC’s clock.  So to wake up in relation to the ref slaves clock, you need to 
sync your application (PC) time to the ref slaves time.

“BTW, is the write cmd to register 0x990 only sent once at the beginning”:
yes

Re #2)
“I didn't get the "first master diff" in syslog”:
I output all my app messages to the syslog via “rtai_lxrt(BIDX, SIZARG, PRINTK, 
&arg);”.  Via std out is fine too.

“And it's quite big”:
It should be small (within max jitter or so).  If it’s big to start with, it 
indicates the initial master time is not being set correctly in the ref slave 
on master activation.  If it becomes big then your drift compensation isn’t 
working.

“I don't know why you are saying m_dcDiff should be around 0”:
As per above, m_dcDiff is the difference between the time master time when the 
frame was sent and the time at the ref slave.  If the initial slave time is set 
up correctly and you have no drift between the master and slave clocks (or 
account for the drift) then m_dcDiff should jitter around 0.


Looking at your logs:

  *   Your syslog logs are showing unmatched and corrupted frames.  You need to 
sort that out first.  Try contact cleaner on the RJ45 / EBus connections.  
Also, try higher quality shielded twisted pair cables.  You need to get it to 
zero comms errors.
  *   You need to get wireshark logs that include the response.  You could use 
the “ethercat pcap” command if you have patch 
“features\pcap\0001-pcap-logging.patch”.  You could also install a physical 
switch between your master and the first slave and use another computer as the 
sniffer (with all protocols disabled on that devices eth port).  Because 
EtherCAT is a broadcast frame you shouldn’t need to do anything special with 
the switch.
  *   Your crc log is a little weird.  There’s no crc or physical errors, but 
there are a lot of forwarded errors (more than the max count).  Maybe there’s 
problems on the master to first slave link.

Graeme.

> From: Circle Fang circlef...@live.com<mailto:circlef...@live.com>
> Sent: Saturday, 3 May 2025 20:10
> To: Graeme Foot graeme.f...@touchcut.com<mailto:graeme.f...@touchcut.com>
> Cc: etherlab-users@etherlab.org<mailto:etherlab-users@etherlab.org>
> Subject: 回复: DC synchronization demo about etherlab master
>
> Hi Graeme,
> Basically I didn't figure out the following two things yet.

  1.  why are we using pc time - reference time to caculate m_dcDiff (m_dcTime 
-slaveTime, even it's named by m_dcTime, i think it's still a term of pc time)? 
I think they could drift together, like both delayed for hundreds of 
microseconds, since reference time(slaveTime) is strong-related to pc time of 
ecrt_master_send (get m_dcTime  then call ecrt_master_send).  However time of 
SYNC0(0x990) is changing quite regularlly. so in this way even m_dcDiff is 
small enough, dc sync error may still occurs becuase both m_dcTime and 
slaveTime may go beyond the time of 0x990. And I'm wondering why not try to 
sync master time to SYNC0 (probably with a sync0_shift interval). In my test, 
when i watch the 64bit of 0x990, it is just sync0_shift-phase-aligned, I mean 
0x990 is always like xxxxxxxxxx500000 (500000 is sync0_shift), why don't we try 
to make slaveTime to, like 0-phase-aligned(it could be a little complicated 
since slaveTime is 32-bit, and i still don't figure out how to do this yet, 
maybe change ecrt_master_reference_clock_time to 64-bit), which means time of 
ecrt_master_send and both slaveTime is always drifting around 0-phase of 
SYNC0/0x990(I mean make 64-bit slaveTime always xxxxxxxxxabcdef wherein abcdef 
is around 0). BTW, is the write cmd to register 0x990 only sent once at the 
beginning?

  1.  I didn't get the "first master diff" in syslog, but I do get this from 
the cmd line which I run the application. And it's quite big. I am not clear 
about this but I think this is OK. Since the 0-phase-aligned m_dcTimeStart 
passed into ecrt_master_application_time at the beginning is used to caculate 
the real dc start time(which written to 0x990) about 100cycles beyond in the 
future (EC_DC_START_OFFSET=100ms, eventually with sync0_shift phase-aligned), 
and our first wakeup time is 50 cycles in the future, even it's 
0-phase-aligned. I don't know why you are saying m_dcDiff should be around 0.
>
> I did 2 test, the first one is using your way, and the second is using my 
> way(i don't know it's right or not, but m_dcDiff is always drifting around 
> 0). please see attched logs. As for wireshark logs, only frames sent from 
> master are captured. In those test, no motion task, only check the recived 
> datagrams, adjust pc time, and send pdos, so the task is quite light-weight, 
> and MSW(mode switch of xenomai) is always 0. However, from wireshark, sending 
> time is getting odd occasionaly. one more earlier test (not recorded) is 
> quite stable, as it's running for 24 hours without any errors, even motion 
> task is running.
>
> The error of "Failed to get reference clock time"  is something like resource 
> un-available, because i called this even for the first time (no datagram 
> received yet). sometimes in the middle of test for a short time, but no dc 
> sync error occurs. And it's not easy to reproduce.
>
> In addition, SMI, power-saving, and a lot of other features that may affect 
> realtime task, are disabled. fixed cpu frequency is also set. and 
> /proc/xenomai/stat is ok(no MSW).
>
> It seems my problem has nothing to do with problems by dc patches as 
> mentioned before, since i don't see the difference before/after patches 
> applied. 
>
> I am sorry this cost you so much time, and I am really really grateful about 
> this.
>
> Best Regards,
> Circle   
>
________________________________
发件人: Graeme Foot <graeme.f...@touchcut.com<mailto:graeme.f...@touchcut.com>>
发送时间: 2025年5月2日 4:21
收件人: Circle Fang <circlef...@live.com<mailto:circlef...@live.com>>
抄送: etherlab-users@etherlab.org<mailto:etherlab-users@etherlab.org> 
<etherlab-users@etherlab.org<mailto:etherlab-users@etherlab.org>>
主题: RE: DC synchronization demo about etherlab master


Hi Circle,



m_dcDiff should jitter around zero.  The previous slave time to current slave 
time should jitter around your period (e.g. 1ms).  The PC clock total 
adjustment should drift by an approximate constant amount over time.  It can 
drift at slightly slower or faster rates over time due to electronics issues 
(such as thermal changes etc.)



The master needs to account for the time drift between the ref slave and PC 
clock so that it interacts with the fieldbus in the fieldbuses timeframe.  
Looking at 0x92C of subsequent slaves won’t help as that is their syncing to 
the ref slave.  We are dealing with the master syncing to the ref slave.



If you enable “ethercat debug 1” and start your app you should see in the 
logging (where main-1 is the ref slave number):

  *   Using slave main-1 as DC reference clock
  *   DEBUG 0-main-1: Checking system time offset.
  *   DEBUG 0-main-1: DC 64 bit system time offset calculation: 
system_time=50278625750, app_time=42973023400, diff=-7305602350
  *   DEBUG 0-main-1: Setting time offset to 18446744066403949266 (was 0)
  *   first master diff: -609.



The first master diff should be quite small (within the jitter range).



You shouldn’t be getting any "Failed to get reference clock time" messages.  
What is the error number that is output with it?



I’m starting to think you may be getting comms errors or something.  Can you 
send the kernel log messages (e.g. dmesg / journalctrl -k) (with “ethercat 
debug 1” set before startup) and maybe the wireshark logs.



Check for comms error using the “ethercat crc” command, check for unmatched 
datagram errors in the system logs and potentially check the wireshark logs for 
mismatched frames around the time of your errors.



Also check that you have the CPU speed stepping (dynamic frequency scaling) 
turned off in the kernel configuration options.  That can cause problems with 
the timestamp clock.



If you have an intel CPU you may need to disable the SMI interrupt.



Also, what are the Yaskawa sync errors you are getting?  Are the A12 errors?  
If so these generally only occur (by default) after three missed PDO’s in a 
row.  You don’t generally get those alarms when you are just having drifting 
errors.  Check the wireshark logs around the time of the error and check the 
reference clock times to see when the master is sending the frames.



Under RTAI you need to have the RTDM interface enabled to allow realtime calls 
from the user space into the masters kernel space.  RTAI will allow you to make 
non-realtime syscalls, but you will lose hard realtime while the syscall is 
occurring.  You can check for any lost hard realtime events using the “/cat 
/proc/rtai/scheduler” command.  I don’t know what the equivalent in Xenomi is.





Regards,

Graeme.





> From: Circle Fang circlef...@live.com<mailto:circlef...@live.com>
> Sent: Friday, 2 May 2025 05:33
> To: Graeme Foot graeme.f...@touchcut.com<mailto:graeme.f...@touchcut.com>
> Cc: etherlab-users@etherlab.org<mailto:etherlab-users@etherlab.org>
> Subject: 回复: DC synchronization demo about etherlab master

>

> Hi Graeme,

>

> I think I was wrong about option a and b. I should compare reference clock 
> time "slaveTime" with ideal values( i.e.,  initial slaveTime + 
> cycle_counters*cycle_ns); And if that difference soon converges to 0, rather 
> than some big weird values for several consecutive cycles (bigger than "sync 
> error limit" threshold in slave) occasionaly . that will prove master is well 
> synced to reference, right?

>

> I should continue on this bug.

>

> How do u_appTimeBase and m_dcDiff change in your app? is there any patterns?

>

> Many thanks again for your help.

> Best Regards,

> Circle

>

>

________________________________

>发件人: Circle Fang <circlef...@live.com<mailto:circlef...@live.com>>
>发送时间: 2025年5月1日 16:22
>收件人: Graeme Foot <graeme.f...@touchcut.com<mailto:graeme.f...@touchcut.com>>
>抄送: etherlab-users@etherlab.org<mailto:etherlab-users@etherlab.org> 
><etherlab-users@etherlab.org<mailto:etherlab-users@etherlab.org>>
>主题: 回复: DC synchronization demo about etherlab master

>

> Hi Graeme,

>

> Where and what value should I monitor if I want to check if my master is well 
> synchronized to reference slave or not? I think that value should eventually 
> converges to 0 soon (or maybe some constant value). I used the following 2 
> options:

> Option a: Initially, I monitor the reference's slaveTime - prev_slaveTime, 
> and it converges to 1000000, meanwhile the jitter of this value is about +- 
> 20 us. I thought this should prove that the master is well synced to 
> reference, since it implies that time of ecrt_master_send is well aligned, 
> but dc sync error still occurs occasionally when app running for several 
> hours.

> Option b: Then I start to monitor m_dcDiff in ecMaster_syncDistClock(since 
> first m_dcDiff  is probably several milliseconds which i don't know why, 
> marked as "fixed", so my m_dcDiff formula is m_dcDiff = 
> (uint32_t)ecMaster->m_dcTime - slaveTime - fixed), and soon eventually it 
> also converges to 0, meanwhile the jitter of m_dcDiff is about several 
> microseconds.

> And I think Option a and b is essentially same. Can I use these values to 
> check synchronization, like 0x92c in other slaves(BTW, 0x92x is no more than 
> 30 nanoseconds usually).

>

> Sometimes I got error from master, "Failed to get reference clock time", even 
> no dc sync error occurs meanwhile.

>

> At last, when I monitor u_appTimeBase, I can see it's increasing(or 
> decreasing) monotonically, like, eventually 1 second per day. Is this normal? 
> ( J1900 cpu and xenomai).

>

> Any ideas/advices are highly appreciated.

> Best Regards,

> Circle

-- 
Etherlab-users mailing list
Etherlab-users@etherlab.org
https://lists.etherlab.org/mailman/listinfo/etherlab-users

Re: [Etherlab-users] DC synchronization demo about etherlab master

Reply via email to