I have found the likely cause in my case. It appears to be an errata with the Intel 82579 Ethernet Controller. Ralf, if you have that controller I suggest you change it
http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/6-and-c200-chipset-specification-update.pdf Errata 17 from the above document 17. Intel ® 82579 Gigabit Ethernet Controller Transmission Issue Problem: Intel ® 82579 Gigabit Ethernet Controller with the Intel 6 Series Chipset and Intel C200 Series Chipset and Intel ME Firmware 7.x 5 MB may stop transmitting during a data transfer. Implication: Intel 82579 Gigabit Ethernet Controller may stop transmitting packets, the link LED will blink, and a power cycle may be required to resume transmission activity. Note: This issue has only been observed in a focused test environment where data is constantly transferred over an extended period of time (more than approximately 3 hours). Workaround: A combination of Intel ME Firmware code change and Intel 82579 Gigabit Ethernet Controller LAN Driver update has been identified and may be implemented as a workaround for this erratum. Status: No Plan to Fix. On Mon, 2016-07-11 at 10:32 -0700, Henry Bausley wrote: > FYI, > > Just changing the host ethernet port seems to have alleviated our > issues with UNMATCHED datagrams. We saw something virtually identical > to Ralph. > > [451886.660655] EtherCAT 0: Domain 0: Working counter changed to 0/13. > [451886.660663] EtherCAT 0: Domain 1: Working counter changed to 0/14. > [451887.168147] EtherCAT WARNING: Datagram cea4900c (domain0-0-main) was > SKIPPED 44 times. > [451887.168154] EtherCAT WARNING: Datagram cea49c0c (domain1-332-main) > was SKIPPED 44 times. > [451887.492141] EtherCAT WARNING 0: 1 datagram TIMED OUT! > [451887.492148] EtherCAT WARNING 0: 731 datagrams UNMATCHED! > [451887.661361] EtherCAT 0: Domain 0: Working counter changed to 13/13. > [451887.661369] EtherCAT 0: Domain 1: Working counter changed to 14/14. > > > In our case the Advantech UNO industrial PC has 4 ethernet ports built > into it. Only the 1st ethernet port built into the motherboard exhibits > the issue, it shows up as an Ethernet controller: Intel Corporation > 82579LM Gigabit Network Connection. It appears that is just a PHY so > the MAC I assume is in the Intel Corporation 6 Series/C200 Series > Chipset. > > The other 3 ports are actually PCI Express MAC/PHYs, they show up as > Intel Corporation 82574L Gigabit Network. Those 3 ports do not exhibit > the UNMATCHED datagram issue. > > When using ethtool -k the only difference I see for the 82579LM versus > the three 82574L is rx-vlan-filter: off for the 82579LM . > rx/tx-checksumming is on for all adapters. > > FYI, > The registers 0x300 and 0x310 remained 0 after the UNMATCHED > datagram error occurred. > > I suggest you look into changing the NIC Ralf. > > On Mon, 2016-07-04 at 08:29 +0200, Ralf Roesch wrote: > > We also are fighting with this type of problem on a customer laser > > cutting machine. > > Occasionally we see errors like this: > > [122501.934306] EtherCAT 0: Domain 0: Working counter changed to 0/9. > > [122501.934346] EtherCAT 0: Domain 1: Working counter changed to 0/9. > > [122502.320449] EtherCAT WARNING 0: 5 datagrams TIMED OUT! > > [122502.935224] EtherCAT 0: Domain 0: Working counter changed to 9/9. > > [122502.935265] EtherCAT 0: Domain 1: Working counter changed to 9/9. > > > > This was the reason I modified the ethercat command line tool for > > extended diagnostics regarding several ESC error registers. > > > > Attached you will find a patch which might help you. > > After applying and building the ethercat command line tool it will > > provide a new command "diag". > > * Shortly after your ethercat master has been started > > successfully call: > > ethercat diag -r > > This will reset all slaves ESC error registers including Lost > > Link Counter Register and RX Error Counter Register. > > * If you detect a an error UNMATCHED and TIMEOUT (sometimes > > after hours or days) call: > > ethercat diag > > If you are lucky you will find one ore more ESC errors > > displayed on your console. > > For better understanding the displayed errors you should to > > picture picture > > http://www.automation.com/images/article/ethercat/Figure14.jpg > > (part of > > > > http://www.automation.com/automation-news/article/diagnostics-with-ethercat-part-4). > > > > Would be happy about any kind of feedback. > > > > > > @Henry: which type of drives do you use? > > > > > > Regards, > > Ralf > > > > > > > > On Mon Jul 04 2016 05:19:58 GMT+0200 (CEST), Graeme Foot > > <graeme.f...@touchcut.com> wrote: > > > > > The only time we've had issues like that has been due to either a dodgy > > > network cable or an RJ45 plug getting a bit grubby. First thing I > > > usually do is unplug/replug all the plugs a few time to clean up the > > > connections. If it persists then I start looking for bad cables. > > > > > > Another option is that there is an occasional noisy process causing noise > > > on one of the links. > > > > > > Once or twice (only on non-ethercat machines so far) we've had cables > > > that were in drag chains wearing out, where it showed a problem when at a > > > specific position of the drag chain. > > > > > > You could track down if it's a problem with a link between two particular > > > slaves by checking each slaves Link Lost Counter and CRC Bad Counter > > > values. > > > - Lost Link Counter Register (0x0310:0x0313) > > > - RX Error Counter Register (0x0300:0x0307) > > > > > > This link describes some of the diagnostics: > > > http://www.automation.com/automation-news/article/diagnostics-with-ethercat-part-4 > > > > > > I think you can set the above registers to zero after the fieldbus is up > > > and running, then you can check them if a problem occurs. > > > > > > > > > Haven't actually done it yet myself, so would be interested to see if it > > > helps you. > > > > > > > > > Regards, > > > Graeme. > > > > > > > > > > > > > > > -----Original Message----- > > > From: etherlab-users [mailto:etherlab-users-boun...@etherlab.org] On > > > Behalf Of Henry Bausley > > > Sent: Saturday, 2 July 2016 5:56 a.m. > > > To: etherlab-users@etherlab.org > > > Subject: [etherlab-users] Intermittent Large number of datagrams UNMATCHED > > > > > > > > > > > > We have a etherlab 1.5.2 kernel mode application running in xenomai > > > 2.4.6 on Ubuntu 14.04.1 Desktop that will get on rare occasions a large > > > number of datagrams UNMATCHED. It occurs at random times and relatively > > > rarely but when it occurs it can result in disaster as we are running a > > > large number of servos in torque mode. > > > > > > For example we can run continuously for 5 days 24hours continuously then > > > get a message like something below. > > > > > > [591785.735172] EtherCAT WARNING 0: 616 datagrams UNMATCHED! > > > I am struggling as to where to look. Is this something in our app or a > > > known bug in the stack? > > > > > > > > > > > > > > > > > > Outbound scan for Spam or Virus by Barracuda at Delta Tau > > > > > > _______________________________________________ > > > etherlab-users mailing list > > > etherlab-users@etherlab.org > > > http://lists.etherlab.org/mailman/listinfo/etherlab-users > > > _______________________________________________ > > > etherlab-users mailing list > > > etherlab-users@etherlab.org > > > http://lists.etherlab.org/mailman/listinfo/etherlab-users > > > > > > _______________________________________________ > etherlab-users mailing list > etherlab-users@etherlab.org > http://lists.etherlab.org/mailman/listinfo/etherlab-users _______________________________________________ etherlab-users mailing list etherlab-users@etherlab.org http://lists.etherlab.org/mailman/listinfo/etherlab-users