Re: [etherlab-users] Alternating working count 0/24 (zero and complete)

J. van der Wulp Thu, 17 Jul 2014 02:42:08 -0700

Thanks for your response.

On 07/16/2014 12:43 AM, Gavin Lambert wrote:
> On 15 July 2014, quoth J. van der Wulp:
>>  - when a frame exceeds the 128 byte threshold then increasingly often
>> the latency of the response frame increases (seems a 100microsecond
>> offset) but our time budget (time between send() and receive()) is 100
>> microseconds. This is the cause for working count 0 errors.
>>  - as long as the process data is such that frame size stays below ~128
>> bytes there is no problem, the working counts stay stable and response
>> latency is more or less constant
> 
> Sure you're not getting a 10Mbit link instead of 100Mbit?  128 bytes of data
> at 5kHz will just about saturate a 10Mbit link.


This is a very interesting thought. It took me while to validate that
indeed the link is 100Mbit. I had to patch the driver to be sure. I
placed the following fragment in ec_poll:

        struct ethtool_cmd cmd = { ETHTOOL_GSET };

        dev->ethtool_ops->get_settings(dev, &cmd);
                             netif_info(tp, probe, tp->dev, "SPEED 0x%d.\n",
ethtool_cmd_speed(&cmd));

which resulted in 100 printed repeatedly. I also established that during
link negotiation the slave as well as the master side advertise 100Mb
(among others).

So far I have found no other good explanation of why the ~128 byte
boundary is so special. I have done an experiment with a couple of
Beckhoff modules with process data of 67 bytes, which I could scale up
to 15Khz without problems (just an occasional working count problem once
every couple of seconds). Yet when crossing the ~128 boundary at 5Kh it
still collapses.

Inspired by older patched versions of the r8169 I made a change to the
ec_poll routine which relieves the symptoms. I still have a bit of an
uneasy feeling with this change as it removes inspection of the
interrupt status register and directly starts the rtl_rx/rtl_tx buffer
processing. Why would the interrupt status be good with frames below 128
bytes and not good otherwise? I still have the feeling that I miss
something.

> 
>>  - use the generic module when operating at 5Khz (only tested with 1.5.2
>> with frame size less than ~128 bytes) gives the same working count 0
>> symptoms, for our application we really seem to need the patched
>> drivers...
> 
> The generic driver is rarely stable over 1kHz; sometimes not even that.
> 
> 

I now capture on the debug interface, but it drops a lot of packages,
and I am not sure as to whether the capture timestamps I get are realistic.

diff -r 8dd49f6f6d32 devices/r8169-3.4-ethercat.c
--- a/devices/r8169-3.4-ethercat.c	Mon May 05 13:55:00 2014 +0200
+++ b/devices/r8169-3.4-ethercat.c	Tue Jul 15 13:21:31 2014 +0200
@@ -5618,13 +5618,9 @@
 	status = rtl_get_events(tp);
 	rtl_ack_events(tp, status & ~tp->event_slow);
 
-	if (status & RTL_EVENT_NAPI_RX) {
-		rtl_rx(dev, tp, 100); // FIXME
-    }
-
-	if (status & RTL_EVENT_NAPI_TX) {
-		rtl_tx(dev, tp);
-    }
+	rtl_rx(dev, tp, 100); // FIXME
+
+	rtl_tx(dev, tp);
 
 	if (jiffies - tp->ec_watchdog_jiffies >= 2 * HZ) {
 		void __iomem *ioaddr = tp->mmio_addr;

_______________________________________________
etherlab-users mailing list
etherlab-users@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-users

Re: [etherlab-users] Alternating working count 0/24 (zero and complete)

Reply via email to