[E1000-devel] Possible races in e1000

Eugene Shatokhin Fri, 02 Aug 2013 08:46:50 -0700

Hi,

There seem to be several data races in e1000 driver (checked a couple of 
usage scenarios on the vanilla Linux kernel 3.11-rc1). Perhaps, not that 
critical but still worth fixing, I suppose.


1.
         CPU0                               CPU1
e1000_probe()
   |
   register_netdev()
   |
   |                                     e1000_open()
   |                                       |
   |                                       e1000_configure()
   |                                         |
   |                                         e1000_set_rx_mode()
   |                                           |
   |                                           rctl = er32(RCTL);
   |                                           (e1000_main.c:2237)
   |                                           |
   |                                           |
   e1000_vlan_filter_on_off                    |
   (e1000_main.c:1222)                         |
     |                                         |
     ew32(RCTL, rctl);                         |
     (e1000_main.c:4823)                       |
                                               |
                                               ew32(RCTL, rctl);
                                               (e1000_main.c:2259)

As soon as register_netdev() registers the network device at 
e1000_main.c:1218, something (e.g. NetworkManager) may try to access the 
device thus triggering e1000_open() and all the path depicted above.

The race window is rather small but still it is possible for the write 
of RCTL in e1000_vlan_filter_on_off() interfere with its update in 
e1000_set_rx_mode() as depicted above.

Perhaps, moving the call to e1000_vlan_filter_on_off() in e1000_probe() 
before register_netdev() will fix this:

------------------------------
+       e1000_vlan_filter_on_off(adapter, false);
+
         strcpy(netdev->name, "eth%d");
         err = register_netdev(netdev);
         if (err)
                 goto err_register;

-       e1000_vlan_filter_on_off(adapter, false);
------------------------------

2.
The accesses to adapter->link_speed, adapter->link_duplex, 
adapter->stats.*, hw->tx_packet_delta, hw->collision_delta  in 
e1000_get_ethtool_stats() are probably not synchronized with those in 
e1000_update_stats().

Example:
         CPU0                                    CPU1
  e1000_watchdog
  (e1000_main.c:2500)
    |
    e1000_update_stats
      |
      <lock adapter->stats_lock>               e1000_get_ethtool_stats
      (e1000_main.c:3620)                        |
      |                                          |
      |                                          data[i] = ... *(u32 *)p;
      |                                          (e1000_ethtool.c:1850)
      adapter->stats.xonrxc += er32(XONRXC);
      (e1000_main.c:3651)
      |
      <unlock adapter->stats_lock>
      (e1000_main.c:3749)


e1000_update_stats() as well as some other functions take 
'adapter->stats_lock' when they access these data, perhaps
e1000_get_ethtool_stats() should do that too?

3.
Our tools also report a number of possible races between e1000_clean() 
(NAPI callback) and e1000_xmit_frame() but we may have missed something.

NAPI poll callbacks and ndo_start_xmit callbacks cannot race with each 
other on a single CPU. But in my experiments, they executed on different 
CPUs.

Is there anything that ensures they never execute concurrently even on 
different CPUs? If so, I'll update our tools accordingly to avoid such 
false positives.

If the callbacks may execute concurrently then there are races in e1000 
on the accesses to tx_ring->next_to_clean, tx_ring->next_to_use, 
tx_ring->buffer_info[i].*, tx_desc->upper.data. Same for the accesses to 
dql->num_queued in netdev_tx_sent_queue() and netdev_completed_queue() 
called from these callbacks.

Example:

         CPU0                                    CPU1
  e1000_clean()
  (e1000_main.c:3812)
    |
    e1000_clean_tx_irq                   e1000_xmit_frame
    (e1000_main.c:3869)                  (e1000_main.c:3260)
      |                                    |
      e1000_unmap_and_free_tx_resource     e1000_tx_map
      (e1000_main.c:1961)                  (e1000_main.c:2880)
        |                                    |
        if (buffer_info->dma) { ...          buffer_info->dma = ...
------------------------------

So, what do you think?

Regards,

Eugene

-- 
Eugene Shatokhin, ROSA Laboratory.
www.rosalab.com

------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

[E1000-devel] Possible races in e1000

Reply via email to