Hey Jun If you look deep into the Documentation you notice that they say that dc sync is influenced by the number of packets sent. Here are my benchmarks for a 6 slaves system when powering it up.
1. 1ms sent_interval for Op_thread with 4ms transmit interval . 33 seconds 2. 500us sent_interval for op_therad with 500us transmit interval. 6 seconds 4. 100us sent_interval for op_thread with 100us transmit interval. 2.5 seconds. I believe the reason is the accuracy which is better in shorter intervals. On Wed, Jan 1, 2014 at 11:19 PM, Jun Yuan <[email protected]> wrote: > Hi Raz, > > there have been many people raised the same kind of questions like you > did. Some of them asked in the mailing list, some of them wrote to me > directly, worrying about those warnings like slave didn't sync after 5 > seconds. For the past two years, I kept answering, that I didn't know > about the DC sync mechanism very much, that by examining the register > 0x092c, it can be confirmed the DCs get perfectly synchronized in the > end anyway, that my customers could get used to obey my rules that > they must wait several minutes doing nothing until the DCs on the > EtherCAT bus get synchronized/converged, that maybe it is the slaves’ > fault to have such a slow convergence for their DC. > > Frankly speaking, I hate my answers, they are like excuses. So I > decided to fight them back, and took some time digging into this > problem for the last two days. > > The first thing to do would be learning how the DC sync mechanism > works. I don't have any official EtherCAT documents, and would be > appreciate if anyone could send me some of the specifications from > EtherCAT. On the internet I did find a paper "On the Accuracy of the > Distributed Clock Mechanism in EtherCAT" and a PPT "Accurate > Synchronization of EtherCAT Systems Using Distributed Clocks" by > Joseph E Stubbs. Those two files helped me a lot. > > The other obstacle is, I don't have any EtherCAT slave devices at > hand. Occasionally I receive a project to develop an interface for a > new sort of slaves using EtherLab Master. Those slaves usually stay > with me for about two to three weeks, and after that, they will be > shipped with my software to our customers. The chance to have a slave > in my office is 1/12, not to mention the deadline pressure from those > projects. I remember I still owe Florian an apology, as he once asked > me to test a new feature of the master, but since then I haven't given > him a reply, because I've been waiting for a slave, expecting that the > next opportunity to have a slave will come soon, but this didn't > happen. So I am lack of a testing environment, which could make my > vision of EtherCAT quite narrowed, and I can’t test my thoughts > myself. > > Alright, here is something I would like to share. > > I. The problem with "No app_time received up to now, but master already > active." > I've been always having this error if I don't call > ecrt_master_application_time() before my realtime cycle loop. I've > also tried giving a garbage value to the first call of this function > outside my loop, and it didn't hurt my system at all. This phenomenon > was recored in my last mails to the mailing list, and the reply from > Florian is, I shouldn't do that. Well, he is right, because in the > first call, the app_time will be saved as app_start_time, and then be > used to calculate the "remainder" correction to the DC start time. By > calling ecrt_master_application_time() prior to the cycle loop, we > will give a wrong starting point for DC cyclic operation on the slave. > I think the end effect will be something like we play with the > sync0->shift_time, that is, set a shift time to the DC sync0. Although > this won't hurt us for the most of time, it is not the right way to do > so. > > Where does this warning come from? > When a master application is running, there would be two threads in > the system. One is the user realtime cycle loop, the other is the > EtherCAT-OP thread. These two thread however, are not synchronized > with each other. > > After calling ecrt_master_activate(), the master goes into > ec_master_operation_thread, which execute further the FSM(finite state > machine) of the master repeatedly. The cycle time of the EtherCAT-OP > thread on my machine is 4ms, my linux kernel is running at 250Hz. And > the function ec_fsm_master_enter_write_system_times will get called > after several ms, which could be something around 4 to 8 ms, I guess. > > If the ecrt_master_application_time() is not be called within that > time, the master would fail to have a app_time in time, and such an > error "No app_time" would occur. > > In my case, my realtime thread happens to have a cycle time of 4ms. > And since my loop is like > > // first doing some initialization job, which costs 10ms > while () { > wait_for_4_ms(); > master_receive(); > ... > master_application_time() > master_send(); > } > > This means, after ecrt_master_activate(), there would be at least 14ms > passed away before the first master_application_time() in my loop get > called. The chance for me to have a "No app_time" warning is > reasonable quite high. > > To resolve this problem properly, I can offer two options: > > The first option is to change your code: Reduce the initialization > time, making the time interval between master_activate() and your > cycle loop as small as possible. > > But what if we have a large cycle time, say 16ms? Our cycle loop will > wait 16 ms anyway before the first master_application_time() get > called, which could be too late for the EtherCAT-OP thread. So my > second option is, to change the code of EtherCAT master. And the > simplest way for me to do so, is to add a "return;" after the line > EC_MASTER_WARN(master, "No app_time received up to now," > " but master already active.\n"); > in master/fsm_master.c. This would force the master FSM to wait until > it has got an app_time. > > Note that I don't have the possibility to do the test. So please > change your etherlab master code, check it out on your system, and > give everybody a feedback if it works. > > > II. The problem with "Slave did not sync after 5000 ms" > This is a little bit more complicated. In short, IMHO, it is the > master who should take the responsibility to this problem. > > Concerning the DC sync, there are 3 phases. > Phase 1. Measure the transmission delays t_delay to each slave. > Phase 2. Calculate the system time offset t_offset for each slave. > Phase 3. Drift compensation, where the slave will adjust their local > DC to have dt = (t_local + t_offset - t_delay) - > t_received_system_time go to 0. > > The first phase will be executed during the bus scanning in the > function ec_fsm_master_state_scan_slave() -> ec_master_calc_dc() -> > ec_master_calc_transmission_delays() -> ec_slave_calc_port_delays(). > It seems that the EtherLab master measure this for only once. Well we > could argue that, measuring the transmission delay for several times > and get its average could generate a better estimation. Until now, my > experiences tell me these values don’t vary much, and it seems the > EtherLab master is doing good. But I will be appreciate if anyone > would like to do the „bus rescan“ thing many times on the same set of > EtherCAT bus, check if the delay_to_next_dc of all the slaves change > too much each times of the bus scan. If it is so, changes must be made > to have several measurements instead of only one in the source of > etherLab master. > > At the beginning of the year 2013, I encountered a phenomenon, which > has been written in my last emails, when I tried to correct it but > failed in the end. This phenomenon in my observation one year ago, is > that, after the bus has reached a stable state for all the DCs, a > restart of the master application would cause a wrongly change of > approx. 4ms to the system_time_offset of the ref clock, and later the > ec_fsm_slave_config_state_dc_sync_check() of the ref slave shows that > there are around 4ms errors between the master clock to the slave > clock at the beginning. This certainly demonstrates the weakness of > the current EtherLab master in the second phase, that the calculation > of the t_offset is not alright. > > Since the t_offset is given wrongly to the slaves by the master, the > difference dt = (t_local + t_offset - t_delay) - > t_received_system_time for the drift compensation becomes too large at > its beginning. In my humble opinion, the EtherLab master might have > abused the functionality of the drift compensation mechanism to > compensate its failure in the accurate calculation of the system time > offset t_offset. > > What is the matter with the time offset? > Let’s have look at the procedure of time offset calculation: > 1. The master FSM prepares a ec_datagram_fprd(fsm->datagram, > fsm->slave->station_address, 0x0910, 24) to read > out the system time of the slave. > 2. The user realtime cycle loop sends out the datagram while calling > ecrt_master_send. > 3. The next ecrt_master_receive fetches the answer. > 4. The master FSM read the datagram and calculate the time offset. > > Take an example, we have a master FMS EtherCAT-OP thread running in a > loop of 4ms, and a user realtime application thread running at 1ms. > Let’s define the time the step 1 happens is x ms. And the user loop > runs 0.5ms after the EtherCAT-OP. > > The following would happen: > Time : Event > x ms: Step 1, FSM prepares an FPRD datagram to 0x0910 > x+0.5ms: Step 2, user loop sets a new app_time; the FPRD datagram gets > sent out, the sending timestamp jiffies is stored in > datagram->jiffies_sent; > x+1.5ms: Step 3, user loop sets a new app_time; the datagram is > received, the receiving timestamp jiffies is stored in > datagram->jiffies_received; > x+2.5ms: user loop sets a new app_time; > x+3.5ms: user loop sets a new app_time; > x+4 ms: Step 4, FSM calculate the time offset. > > And here is the source code in ec_fsm_master_dc_offset64() > > // correct read system time by elapsed time since read operation > correction = (u64) (jiffies_since_read * 1000 / HZ) * 1000000; > system_time += correction; > time_diff = fsm->slave->master->app_time - system_time; > > The jiffies is a counter in Linux kernel which get increased by 1 in a > frequency defined by HZ. I have a 250 Hz linux system, so the 1 > jiffies means 4 ms. As jiffies_sent was taken when the master clock is > x+0.5ms, and the current jiffies value is taken at x+4ms. We have a > possibility of 0.5/4 = 12.5% that the jiffies don’t increase itself > during that 3.5ms time, and 87.5% possibility that the jiffies has > been increased by 1. This means the value „correction“ would have a > typical value of 4000000ns, occasionally being 0 ns. > > Let’s assume that the slave DC has been perfectly synchronized with > the master app time. So the system_time from the slave equals to > 0.5ms(the time the FPRD datagram was sent). With correction added, > system_time = x+4.5ms or x+0.5ms. > > The app_time is x+3.5ms at the time of the Step 4.. > > time_diff = app_time - system_time = -1000000ns for the most of the > time, and around 2000000ns occasionally, depending on the correction . > > See, the time_diff should actually be 0, not -1ms or 2ms, as we said, > the slave DC is perfectly synchronized with the master app time. > > You may argue that the -1ms error isn’t that too much, but this error > will typically goes to around -4ms if the user realtime cycle loop is > running every 4ms, as in my case one year ago. > > Where comes the error in the calculation? > Two reasons: > 1. jiffies have a bad resolution of 4ms in a linux system of 250Hz. > 2. app_time is not the time when Step 4 is executed. > > While using get_cycles() instead of jiffies could be able to improve > the accuracy of the correction, the fact that app_time is not the > current master system time would still drags errors into time offset. > > Why do we need "correction" here at all? Because the app_time in Step > 4 is not the app_time of the slave system time reading. > > The key is to have the correct app_time the FPRD datagram 0x0910 is > sent, and use that app_time to calculate the time_diff, without any > correction any more of course. > > I know, it is easier said than done. Right now I have two ideas for the > master. > The first idea: add a new variable app_time_sent to the ec_datagram_t > struct. write down the app_time when each datagram get sent. time_diff > = datagram->app_time_sent - system_time(0x0910); > > The second solution is a little bit tricky: triggers the calculation > by the user realtime cycle loop. i.e. we may check the fsm_datagram in > ecrt_master_receive() or even in ecrt_master_application_time() when > the last app_time is still there. If we find out it is a FPRD 0x0910 > datagram, we do the calculation right away using the old app_time. > > I think the first idea would be easier to implement. > > > Besides the inaccurate calculation of the time offset, the other issue > in the EtherLab master that bothers me is, it seems to me that the > drift compensation is working at the same time when the new system > time offset is > calculated and sent to the slaves, as the drift compensation is in the > user realtime cycle loop and the t_offset calculation is the > EtherCAT-OP. Shouldn’t we get the offset calculation be done first, > before sending ref_sync_datagram to the ref clock and sync_datagram to > the other slaves? Won’t the drift compensation algorithm of the slaves > have any effects on its local DC time (by slowing or fastening the > clock), which then effects the t_offset calculation? Since phase 2 and > 3 happens simultaneously, won’t the sudden change of the > t_offset(which causes a sudden change of dt) causes some sort of > disturbance to the drift compensation algorithm on the slave? > > I think we may need a boolean, set by the FSM to tell the user thread > whether phase 2 is done, the user thread only calls > ecrt_master_sync_reference_clock(master) and > ecrt_master_sync_slave_clocks(master) when the correct system time > offset for each slaves have been sent to the slaves. > > > > Sorry to have written such a long email, I hope I’ve made my thoughts > clear. I could be wrong in many different places, I’ll be very happy > if somebody could change the EtherLab master code the way as I > mentioned and test it for me. > > > Wish all of you a Happy New Year! > > Jun > > On Mon, Dec 30, 2013 at 2:32 PM, Raz <[email protected]> wrote: > > Hey > > > > At the moment it takes a long time to calibrate the dc. aprox 5 seconds > > for each slave. I am setting up a system which is supposed to control > > over 12 axes and the calibration duration reaches a minute. > > > > Is it possible to reduce this time ? > > > > > > -- > > https://sites.google.com/site/ironspeedlinux/ > > > > _______________________________________________ > > etherlab-users mailing list > > [email protected] > > http://lists.etherlab.org/mailman/listinfo/etherlab-users > > > > > > -- > Jun Yuan > [Aussprache: Djün Üän] > > Robotics Technology Leaders GmbH > Am Loferfeld 58, D-81249 München > Tel: +49 89 189 0465 24 > Mobile: +49 176 2176 5238 > Fax: +49 89 189 0465 11 > mailto: [email protected] > > Umlautregel in der chinesischen Lautschrift Pinyin: Nach den Anlauten > y, j, q, und x wird u als ü ausgesprochen, z.B. yu => ü, ju => dschü, > qu => tschü, xu => schü. > -- https://sites.google.com/site/ironspeedlinux/
_______________________________________________ etherlab-users mailing list [email protected] http://lists.etherlab.org/mailman/listinfo/etherlab-users
