Subject: [BUG] e1000: KCSAN: data-race in e1000_clean_tx_irq / e1000_tso

Dear Maintainers,

We are writing to report a KCSAN-detected data race vulnerability within the 
`e1000` network driver. This bug was found by our custom fuzzing tool, 
RacePilot. The race occurs when `e1000_tso()` writes a new `next_to_watch` 
tracking array index for a transmission buffer without volatile locking, while 
the `e1000_clean_tx_irq()` subroutine locklessly evaluates this field 
concurrently. We observed this bug on the Linux kernel version 
6.18.0-08691-g2061f18ad76e-dirty.

Call Trace & Context
==================================================================
BUG: KCSAN: data-race in e1000_clean / e1000_xmit_frame

write to 0xffffc9000414e492 of 2 bytes by task 5500 on cpu 0:
 e1000_tso drivers/net/ethernet/intel/e1000/e1000_main.c:2752 [inline]
 e1000_xmit_frame+0x14fe/0x2a90 
drivers/net/ethernet/intel/e1000/e1000_main.c:3226
 __netdev_start_xmit include/linux/netdevice.h:5273 [inline]
 netdev_start_xmit include/linux/netdevice.h:5282 [inline]
 xmit_one net/core/dev.c:3853 [inline]
 dev_hard_start_xmit+0xee/0x3a0 net/core/dev.c:3869
 ...
 tcp_write_xmit+0xf64/0x3ee0 net/ipv4/tcp_output.c:3002
 tcp_push_one+0x87/0xa0 net/ipv4/tcp_output.c:3199
 
read to 0xffffc9000414e492 of 2 bytes by interrupt on cpu 1:
 e1000_clean_tx_irq drivers/net/ethernet/intel/e1000/e1000_main.c:3871 [inline]
 e1000_clean+0x2fb/0x1570 drivers/net/ethernet/intel/e1000/e1000_main.c:3804
 __napi_poll+0x5d/0x3f0 net/core/dev.c:7666
 napi_poll net/core/dev.c:7729 [inline]
 net_rx_action+0x6cc/0x890 net/core/dev.c:7881
 handle_softirqs+0xbe/0x290 kernel/softirq.c:622
 ...

value changed: 0x0083 -> 0x008d

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 UID: 0 PID: 194865 Comm: syz.3.8279 Not tainted 
6.18.0-08691-g2061f18ad76e-dirty #44 PREEMPT(voluntary) 
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
==================================================================

Execution Flow & Code Context
During standard outgoing network transactions over the `e1000` queue layout, 
`e1000_tso` maps buffer parameters into descriptions and sets 
`buffer_info->next_to_watch`:
```c
// drivers/net/ethernet/intel/e1000/e1000_main.c
static int e1000_tso(struct e1000_adapter *adapter,
                     struct e1000_tx_ring *tx_ring, struct sk_buff *skb,
                     __be16 protocol)
{
        ...
                context_desc->cmd_and_length = cpu_to_le32(cmd_length);

                buffer_info->time_stamp = jiffies;
                buffer_info->next_to_watch = i; // <-- Concurrent 2-byte 
lockless write

                if (++i == tx_ring->count)
                        i = 0;
        ...
}
```

Simultaneously, `e1000_clean_tx_irq()` operates from IRQ/NAPI loops tracking 
trailing descriptor completion, directly reading `next_to_watch` from active 
bounds before evaluating the hardware completion marker bit:
```c
// drivers/net/ethernet/intel/e1000/e1000_main.c
static bool e1000_clean_tx_irq(struct e1000_adapter *adapter,
                               struct e1000_tx_ring *tx_ring)
{
        ...
        i = tx_ring->next_to_clean;
        eop = tx_ring->buffer_info[i].next_to_watch; // <-- Concurrent 2-byte 
lockless read
        eop_desc = E1000_TX_DESC(*tx_ring, eop);

        while ((eop_desc->upper.data & cpu_to_le32(E1000_TXD_STAT_DD)) &&
        ...
```

Root Cause Analysis
A KCSAN data race arises because `e1000_clean_tx_irq()` fetches `next_to_watch` 
before examining the transmit completion synchronization `E1000_TXD_STAT_DD` 
bit asynchronously across independent core cycles. This inherently causes 
collision overlaps against background `e1000_xmit_frame` calls queuing buffers 
onto identical arrays during transmission pipelines. While logical structures 
inherently mask off processing loops if the `DD` bit defaults logically, 
unpredictable compiler caching causes unoptimized structure tearing.
Unfortunately, we were unable to generate a reproducer for this bug.

Potential Impact
This data race presents theoretical local memory corruption or networking state 
breakdown risks alongside constant KCSAN performance degradation if a compiler 
incorrectly infers visibility assumptions across unannotated variables bridging 
hardware transmission synchronizations. 

Proposed Fix
Implementing `READ_ONCE()` and `WRITE_ONCE()` bounds around `next_to_watch` 
resolves the data race logically, ensuring correct load barriers on NAPI 
consumers:

```diff
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -2749,7 +2749,7 @@ static int e1000_tso(struct e1000_adapter *adapter,
                context_desc->cmd_and_length = cpu_to_le32(cmd_length);
 
                buffer_info->time_stamp = jiffies;
-               buffer_info->next_to_watch = i;
+               WRITE_ONCE(buffer_info->next_to_watch, i);
 
                if (++i == tx_ring->count)
                        i = 0;
@@ -3839,7 +3839,7 @@ static bool e1000_clean_tx_irq(struct e1000_adapter 
*adapter,
        unsigned int bytes_compl = 0, pkts_compl = 0;
 
        i = tx_ring->next_to_clean;
-       eop = tx_ring->buffer_info[i].next_to_watch;
+       eop = READ_ONCE(tx_ring->buffer_info[i].next_to_watch);
        eop_desc = E1000_TX_DESC(*tx_ring, eop);
 
        while ((eop_desc->upper.data & cpu_to_le32(E1000_TXD_STAT_DD)) &&
```
*(Note: Similar `WRITE_ONCE` bounds should also be applied to `e1000_tx_map` 
and `e1000_tx_csum` respectively)*

We would be highly honored if this could be of any help.

Best regards,
RacePilot Team

Reply via email to