Re: [etherlab-dev] Hot plugged modules failing to read DC register

Graeme Foot Wed, 25 Sep 2019 18:29:35 -0700

Hi,

There is no "AL status message" occurring in the syslog.  As such there is also 
no acknowledgement.


I just mentioned the reg_read request as an aside to say that I can't manually 
query the state of the slave while it is in SAFEOP+ERROR.

The error flag is set and the slave goes to SAFEOP+ERROR due to DC failing to 
sync.
DC fails to sync due to the master not setting the DC time offset.
The master does not set the DC time offset due to the has_dc_system_time flag 
not being set.
The has_dc_system_time flag is not set because the slave does not respond to 
the 0x0910 reg read request.


I'm not using any patch to disable automatic rescan as far as I'm aware.  I'm 
only using your patchset and the extra's I've submitted.

Cheers,
Graeme.

From: Gavin Lambert <gavin.lamb...@tomra.com>
Sent: Thursday, 26 September 2019 1:13 PM
To: Graeme Foot <graeme.f...@touchcut.com>; etherlab-dev@etherlab.org
Subject: RE: Hot plugged modules failing to read DC register

I'm not talking about a reg_read request.

When the slave first goes into SAFEOP+ERROR, the master should detect this 
during its regular slave scan cycle (fsm_master_state_read_al_status).  It then 
calls ec_fsm_change_ack to acknowledge the ERROR state - as it does this, it 
should report something like this to the syslog (even at debug 0 level):

kernel: [49369.927687] EtherCAT ERROR 0-main-2: AL status message 0x001B: "Sync 
manager watchdog".
kernel: [49369.953541] EtherCAT 0-main-2: Acknowledged state SAFEOP.

This is the message that tells you why the slave was unhappy and went to the 
ERROR state in the first place.  After this, the slave should be in SAFEOP (no 
ERROR), and the master will usually then bring it back to INIT and do a full 
reconfigure and back to OP (if the master app is still running).

If you have the quick-op patch applied, and the AL status was 0x001B and not 
some other error (and the master hasn't restarted since the slave entered the 
error state), then instead of going back to INIT in the above case it will try 
to go straight back from SAFEOP to OP to reduce the downtime.  As mentioned in 
the patch notes, this is fine in most cases but it can upset some DC slaves (as 
it may not give them enough time to resync), so you may want to disable this 
feature via configure.  (You can clearly see in the syslogs when it's 
attempting a quick return to OP.)

The slave should always accept acknowledging the ERROR state and returning to 
INIT.

The "slave has error_flag set" thing is different; that's referring to 
slave->error_flag, which is only set when the slave fails to respond correctly 
to requests (such as refusing to perform AL state changes or not responding to 
internal register requests), despite still responding to the broadcast AL state 
request (used to determine the number of active slaves).  The only way to clear 
this flag is to rescan, either via "ethercat rescan" or by the automatic rescan 
that it performs when the number of slaves changes (as it normally does during 
a genuine comms interruption).

So have a look further back in the logs to see why the error_flag is being set 
in the first place.

Though also make sure that you're not using any extra patches which disable the 
automatic rescan.  I recall someone on this list a while ago suggesting this 
was a good idea; it isn't.


Gavin Lambert
Senior Software Developer


[cid:image001.png@01D5746C.903BDD30]
[TOMRA]<http://www.compacsort.com/>[Facebook]<https://www.facebook.com/Compacsort>[Linkedin]<https://www.linkedin.com/company/compac-sorting-equipment/>[Youtube]<https://vimeo.com/compacsort>[twitter]<https://twitter.com/compacsort>[instagram]<https://www.instagram.com/compacsort/>

COMPAC SORTING EQUIPMENT LTD | 4 Henderson Pl | Onehunga | Auckland 1061 | New 
Zealand
Switchboard: +64 96 34 00 88 | tomra.com<http://www.tomra.com>

The information contained in this communication and any attachment is 
confidential and may be legally privileged. It should only be read by the 
person(s) to whom it is addressed. If you have received this communication in 
error, please notify the sender and delete the communication.

From: Graeme Foot <graeme.f...@touchcut.com<mailto:graeme.f...@touchcut.com>>
Sent: Thursday, 26 September 2019 12:32
To: Gavin Lambert <gavin.lamb...@tomra.com<mailto:gavin.lamb...@tomra.com>>; 
etherlab-dev@etherlab.org<mailto:etherlab-dev@etherlab.org>
Subject: RE: Hot plugged modules failing to read DC register

Hi Gavin,

When 0x0910 returns a working counter of 0 the master assumes the slave only 
supports delay measurement and the slave->has_dc_system_time flag remains zero. 
 If the flag is zero a few DC configuration steps are missed.  Subsequently, 
when the slave DC Assign Active value is set the slave has not been set up to 
use DC correctly and fails to sync.

So without the patch (extracts from attached "DC Sync Error.txt"):
[  270.369125] EtherCAT DEBUG 0-main-3: Ready for requests.
[  270.376908] EtherCAT DEBUG 0-main-3: Slave has no System Time register; 
delay measurement only.
[  270.393234] EtherCAT WARNING 0-main-3: SII Read Error, EEPROM not loaded.  
Retrying...
[  270.476850] EtherCAT 0-main-3: SII EEPROM loaded.  Continuing.
[  270.916606] EtherCAT DEBUG 0-main-3: Unknown category type 0x003C.
[  270.916612] EtherCAT DEBUG 0-main-3: Slave is not in the state to do mailbox 
com (INIT), setting to PREOP.
[  270.916616] EtherCAT DEBUG 0-main-3: Configuring...
[  270.920595] EtherCAT DEBUG 0-main-3: Now in INIT.
[  270.920598] EtherCAT DEBUG 0-main-3: Clearing FMMU configurations...
[  270.920601] EtherCAT DEBUG 0-main-3: Clearing mailbox check flag...
[  270.922603] EtherCAT DEBUG 0-main-3: Clearing sync manager configurations...
[  270.924595] EtherCAT DEBUG 0-main-3: Configuring mailbox sync managers...
[  270.924600] EtherCAT DEBUG 0-main-3: SM0: Addr 0x1000, Size 128, Ctrl 0x26, 
En 1
[  270.924605] EtherCAT DEBUG 0-main-3: SM1: Addr 0x1080, Size 128, Ctrl 0x22, 
En 1
[  270.930589] EtherCAT DEBUG 0-main-3: Now in PREOP.
[  270.930592] EtherCAT DEBUG 0-main-3: Finished configuration.

[  273.639104] EtherCAT WARNING 0-main-3: Slave seems not to support 
distributed clocks!
[  273.647686] EtherCAT DEBUG 0-main-3: Setting DC cycle times to 1000000 / 0.
[  273.649100] EtherCAT DEBUG 0-main-3: Checking for synchrony.
[  273.743070] EtherCAT DEBUG 0-main-3: Sync after  100 ms: 1828300146 ns
[  273.842996] EtherCAT DEBUG 0-main-3: Sync after  200 ms: 1828303744 ns
...
[  278.440471] EtherCAT DEBUG 0-main-3: Sync after 4800 ms: 1830172951 ns
[  278.540418] EtherCAT DEBUG 0-main-3: Sync after 4900 ms: 1830251113 ns
[  278.640369] EtherCAT WARNING 0-main-3: Slave did not sync after 5000 ms.
[  278.648810] EtherCAT DEBUG 0-main-3: app_start_time=0
[  278.648814] EtherCAT DEBUG 0-main-3:       app_time=289356014683
[  278.648817] EtherCAT DEBUG 0-main-3:     start_time=289456014683
[  278.648820] EtherCAT DEBUG 0-main-3:          cycle=1000000
[  278.648824] EtherCAT DEBUG 0-main-3:     shift_time=500000
[  278.648827] EtherCAT DEBUG 0-main-3:      remainder=14683
[  278.648830] EtherCAT DEBUG 0-main-3:          start=289457500000
[  278.648850] EtherCAT DEBUG 0-main-3: Setting DC cyclic operation start time 
to 289457500000.
[  278.650379] EtherCAT DEBUG 0-main-3: Setting DC AssignActivate to 0x0300.

[  278.695998] EtherCAT DEBUG 0-main-3: Now in SAFEOP.
[  283.688588] EtherCAT ERROR 0-main-3: Timeout while setting state OP.
[  283.726198] EtherCAT 0: Slave states on main device: PREOP, SAFEOP, OP + 
ERROR.
[  283.766130] EtherCAT DEBUG 0-main-3: SAFEOP -> SAFEOP + ERROR.

Comparing that to with the patch (extracts from the attached "DC With 
Patch.txt"):
[   47.873112] EtherCAT DEBUG 0-main-3: Ready for requests.
[   47.873140] EtherCAT DEBUG 0-main-3: Scanning slave 3 on main link.
[   47.881120] EtherCAT WARNING 0-main-3: Slave did not respond to System Time 
register request, retrying...
[   47.956443] EtherCAT 0-main-3: Slave has the System Time register.
[   48.021036] EtherCAT DEBUG 0-main-3: SII firmware file not found; reading 
SII data from slave.
[   48.400841] EtherCAT DEBUG 0-main-3: Unknown category type 0x003C.
[   48.400848] EtherCAT DEBUG 0-main-3: Slave is not in the state to do mailbox 
com (INIT), setting to PREOP.
[   48.400852] EtherCAT DEBUG 0-main-3: Configuring...
[   48.404827] EtherCAT DEBUG 0-main-3: Now in INIT.
[   48.404831] EtherCAT DEBUG 0-main-3: Clearing FMMU configurations...
[   48.404835] EtherCAT DEBUG 0-main-3: Clearing mailbox check flag...
[   48.406831] EtherCAT DEBUG 0-main-3: Clearing sync manager configurations...
[   48.408828] EtherCAT DEBUG 0-main-3: Clearing DC assignment...
[   48.410832] EtherCAT DEBUG 0-main-3: Configuring mailbox sync managers...
[   48.410839] EtherCAT DEBUG 0-main-3: SM0: Addr 0x1000, Size 128, Ctrl 0x26, 
En 1
[   48.410843] EtherCAT DEBUG 0-main-3: SM1: Addr 0x1080, Size 128, Ctrl 0x22, 
En 1
[   48.416831] EtherCAT DEBUG 0-main-3: Now in PREOP.
[   48.416836] EtherCAT DEBUG 0-main-3: Finished configuration.

[   49.991966] EtherCAT DEBUG 0-main-3: Checking system time offset.
[   49.993418] EtherCAT DEBUG 0-main-3: DC 64 bit system time offset 
calculation: system_time=2217603868, app_time=59887013582, diff=57669409714
[   49.993423] EtherCAT DEBUG 0-main-3: Setting time offset to 57669409714 (was 
0)

[   51.152316] EtherCAT DEBUG 0-main-3: Setting DC cycle times to 1000000 / 0.
[   51.154320] EtherCAT DEBUG 0-main-3: Checking for synchrony.
[   51.256259] EtherCAT DEBUG 0-main-3: Sync after  100 ms:      14466 ns
[   51.354208] EtherCAT DEBUG 0-main-3: Sync after  200 ms:      14695 ns
...
[   51.955878] EtherCAT DEBUG 0-main-3: Sync after  800 ms:      12253 ns
[   52.053823] EtherCAT DEBUG 0-main-3: Sync after  900 ms:      11113 ns
[   52.133801] EtherCAT DEBUG 0-main-3: 10000 ns difference after 970 ms.
[   52.133806] EtherCAT DEBUG 0-main-3: app_start_time=0
[   52.133810] EtherCAT DEBUG 0-main-3:       app_time=62029021245
[   52.133813] EtherCAT DEBUG 0-main-3:     start_time=62129021245
[   52.133816] EtherCAT DEBUG 0-main-3:          cycle=1000000
[   52.133820] EtherCAT DEBUG 0-main-3:     shift_time=500000
[   52.133823] EtherCAT DEBUG 0-main-3:      remainder=21245
[   52.133826] EtherCAT DEBUG 0-main-3:          start=62130500000
[   52.133829] EtherCAT DEBUG 0-main-3: Setting DC cyclic operation start time 
to 62130500000.
[   52.135789] EtherCAT DEBUG 0-main-3: Setting DC AssignActivate to 0x0300.

[   52.184538] EtherCAT DEBUG 0-main-3: Now in SAFEOP.
[   52.749441] EtherCAT DEBUG 0-main-3: Now in OP. Finished configuration.


Without the patch there is no AL Error code, just a failure to DC Sync.  
Without the slave->has_dc_system_time flag being set the DC time offset is not 
applied so the slave has no chance to sync.

Without the patch it took the slave approx. 108ms for the SII loading to be 
complete.  With the patch it took approx. 83ms for the 0x0910 register to 
respond.  They are generally a similar timeframe so I suspect the slave does 
not reliably respond while it is still initialising.

Also, restarting the application or manually changing the modules state to INIT 
or PREOP and back to OP doesn't sort out the flag.  If there has been enough 
time for the modules DC clock to eventually sync it will reach OP, but if not 
it will go back to SAFEOP + ERROR.  It takes a "rescan" command to the master 
to sort out the flag so that the offset calculation can be performed.

I've also found that when the slave is in SAFEOP + ERROR reg_read requests (and 
pretty much all communications to the slave) fail due to "Aborting register 
request, slave has error flag set.".  So it's a little hard to manually check 
if there's any error status set.

Graeme.


From: Gavin Lambert <gavin.lamb...@tomra.com<mailto:gavin.lamb...@tomra.com>>
Sent: Wednesday, 25 September 2019 5:06 PM
To: Graeme Foot <graeme.f...@touchcut.com<mailto:graeme.f...@touchcut.com>>; 
etherlab-dev@etherlab.org<mailto:etherlab-dev@etherlab.org>
Subject: RE: Hot plugged modules failing to read DC register

When the slave goes to safeop+error it should also output an AL error code 
which might give a hint as to why.  This should be logged to the syslog when 
the master acknowledges the error.

AL error 0x001B, for example, indicates that the slave stopped receiving SM 
frames (typical of a comms interruption) - and features/quick-op in the 
patchset tries to do a quicker recovery for this case by trying to go straight 
back to OP instead of going through a full PREOP reconfiguration.  It's 
possible that some slaves may need the full reconfigure, so you can disable 
this behaviour at configure time.

Other AL error codes mean other things, such as your DC cycle being poorly 
synced and frames not occurring in a strict SYNC0-SM-SYNC0-SM ordering.


But I wouldn't normally expect any standard registers to fail WC when this 
occurs, unless perhaps the slave was performing a full power reset (or 
otherwise holding the slave's ESC in reset).  Though this would interrupt comms 
to any downstream slaves as well, so it's not something that slaves are 
supposed to do of their own accord.  (And it shouldn't stay in safeop+error in 
that case, it should revert to Init, although that's up to the slave 
implementation.)

90ms seems a bit slow for just an ESC power-on SII read, although it's possible 
that it's doing something more complicated.

I'm not really familiar with those modules, however; you're probably best off 
asking Beckhoff directly.


Gavin Lambert
Senior Software Developer


[cid:image001.png@01D5746C.903BDD30]
[TOMRA]<https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.compacsort.com&data=02%7C01%7Cgavin.lambert%40tomra.com%7C1ff312cb3ca744c8d84608d74219029f%7C4308d118edd143008a37cfeba8ad5898%7C0%7C0%7C637050547545822065&sdata=4v%2FLiit48W4T0CTGsry1ht9XN4rQFjF2sd9oTr0qYko%3D&reserved=0>[Facebook]<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.facebook.com%2FCompacsort&data=02%7C01%7Cgavin.lambert%40tomra.com%7C1ff312cb3ca744c8d84608d74219029f%7C4308d118edd143008a37cfeba8ad5898%7C0%7C0%7C637050547545822065&sdata=54WxDplleMlI8eNdg0Ww44QP9ruo7WMzPU59LtZkYSE%3D&reserved=0>[Linkedin]<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2Fcompac-sorting-equipment%2F&data=02%7C01%7Cgavin.lambert%40tomra.com%7C1ff312cb3ca744c8d84608d74219029f%7C4308d118edd143008a37cfeba8ad5898%7C0%7C0%7C637050547545832054&sdata=16LH%2FgzUujQGM73f0tA99iUTZKQqO0lDQ41bVXl440E%3D&reserved=0>[Youtube]<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fvimeo.com%2Fcompacsort&data=02%7C01%7Cgavin.lambert%40tomra.com%7C1ff312cb3ca744c8d84608d74219029f%7C4308d118edd143008a37cfeba8ad5898%7C0%7C0%7C637050547545832054&sdata=qjMN%2B8ZoxIl3as6CW7Ro4IdZUUA1Uro02ONcEKSTgSM%3D&reserved=0>[twitter]<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Fcompacsort&data=02%7C01%7Cgavin.lambert%40tomra.com%7C1ff312cb3ca744c8d84608d74219029f%7C4308d118edd143008a37cfeba8ad5898%7C0%7C0%7C637050547545842052&sdata=oCwVN5X9I%2F6T4PCqVUpqFDN9%2B0HhzjlA%2BoA2euzikAQ%3D&reserved=0>[instagram]<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.instagram.com%2Fcompacsort%2F&data=02%7C01%7Cgavin.lambert%40tomra.com%7C1ff312cb3ca744c8d84608d74219029f%7C4308d118edd143008a37cfeba8ad5898%7C0%7C0%7C637050547545842052&sdata=V6jixq%2BG%2FgvXrZoR0gCRO8em6kDbusZ5cNo56Zld%2BNQ%3D&reserved=0>

COMPAC SORTING EQUIPMENT LTD | 4 Henderson Pl | Onehunga | Auckland 1061 | New 
Zealand
Switchboard: +64 96 34 00 88 | 
tomra.com<https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.tomra.com&data=02%7C01%7Cgavin.lambert%40tomra.com%7C1ff312cb3ca744c8d84608d74219029f%7C4308d118edd143008a37cfeba8ad5898%7C0%7C0%7C637050547545852042&sdata=rWnlGtoCRWYNL8TRJLa400PjLG8n8fwsXOSzJN0Q28U%3D&reserved=0>

The information contained in this communication and any attachment is 
confidential and may be legally privileged. It should only be read by the 
person(s) to whom it is addressed. If you have received this communication in 
error, please notify the sender and delete the communication.

From: Graeme Foot
Sent: Tuesday, 24 September 2019 17:20
To: etherlab-dev@etherlab.org<mailto:etherlab-dev@etherlab.org>
Subject: [etherlab-dev] Hot plugged modules failing to read DC register

Hi,

I've had occasional issues with EL7332 and EL7342 modules where they will go to 
SafeOp + Error if you try and use them in DC mode.  I've finally had some time 
to look into it a little further.

When the modules go to SafeOp + Error the master outputs the message "Slave has 
no System Time register; delay measurement only." (with debug level 1).  This 
occurs due to the datagram reading register 0x0910 returning a working counter 
of zero.

I created a quick hack to retry reading the register up to 100 times before 
failing.  After approx. 90ms the EL7342 module I'm testing with successfully 
returned the datagram and the slave entered Op state successfully.

In my test setup I also have an EL5101 module that was doing the exact same 
thing (and taking around the same time), but I've never really had issues with 
them before.  I suspect the difference is that if you have incorrect settings 
on the EL7342 module and try to run a motor it can error out and reset itself, 
causing a situation equivalent to a hot plug.

Without my hack both modules need to wait for the SII read to complete for a 
similar length of time, so it looks like the slaves do not respond to the 
0x0910 register request until the EEPROM read is complete.  Does anyone know if 
this is expected behaviour, or know of a better solution than to retry reading 
the register (up to 200ms ???)?


Regards,
Graeme.

_______________________________________________
etherlab-dev mailing list
etherlab-dev@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev

Re: [etherlab-dev] Hot plugged modules failing to read DC register

Reply via email to