Dear Francois, other r8169 experts, 

Am 22.01.2018 um 01:09 schrieb Francois Romieu:
> Are you able to retrieve the layout ? That is, does it appear to match:
> 
> - r8169 hardware stats DMA buffer ?
>   TxOk, RxOk, TxErr, RxErr, ...
> 
> - rtnl_link_stats ?
>   rx_packets, tx_packets, rx_bytes, tx_bytes, ...
> 
> or something else ?
> 

It took me a while, somehow it seems the bug does not always occur - 
potentially there's also some race involved. 
Reproducing on a Ubuntu 17.10 system I found the following:

Address in virtual memory || value
0x7f87bb4c6000            || 0x00000217
0x7f87bb4c6008            || 0x000003ab
0x7f87bb4c6018            || 0x00000000
0x7f87bb4c6028            || 0x00000279
0x7f87bb4c6030            || 0x000000e1
0x7f87bb4c6038            || 0x00000051

At almost the same time, I find the following numbers in /proc/self/net/dev for 
the device:

             decimal || hex
RX bytes:    870820  || 0x000d49a4
   packets:     945  || 0x000003b1
   errs           0  || 
   drop           0  || 
   fifo           0  ||  
   frame          0  || 
   compressed     0  || 
   multicast     83  || 0x00000053
TX bytes:     58505  || 0x0000e489
   packets:     535  || 0x00000217
   errs           0  || 
   drop           0  ||
   fifo           0  || 
   frame          0  || 
   compressed     0  || 
   multicast      0  || 

Since there was a small delay in time (reading from /proc/self/net/dev happened 
a few seconds later),
these values are by a few packets off from the memory dump. 

So I deduce the layout:
0x7f87bb4c6000   TX Packets
0x7f87bb4c6008   RX Packets
0x7f87bb4c6010    * corruption not seen by memtester for whatever reason *
0x7f87bb4c6018   ???
0x7f87bb4c6020    * corruption not seen by memtester for whatever reason *
0x7f87bb4c6028   ???
0x7f87bb4c6030   ???
0x7f87bb4c6038   RX multicast (?)

So the only thing which is fully clear is that there are TX Packets and after 
that RX Packets. 

Checking through the driver sources, I find rtnl_link_stats64 can not be the 
culprit, since it has rx_packets and only after tx_packets. 
However, struct rtl8169_counters looks like:
struct rtl8169_counters {
        __le64  tx_packets;
        __le64  rx_packets;
        __le64  tx_errors;
        __le32  rx_errors;
        __le16  rx_missed;
        __le16  align_errors;
        __le32  tx_one_collision;
        __le32  tx_multi_collision;
        __le64  rx_unicast;
        __le64  rx_broadcast;
        __le32  rx_multicast;
        __le16  tx_aborted;
        __le16  tx_underun;
};
This looks like it could very well match the structure found in memory, so 
something would be broken related to rtl8169_do_counters, in the DMA transfer. 

Does this help - can I provide more info? I get the feeling this affects many 
tens of thousands of systems and just has been hidden due to 
network stats being read rarely... 

Cheers,
Oliver

Reply via email to