Subject: [BUG] e1000: KCSAN: data-race in e1000_clean_tx_irq / e1000_tso
Dear Maintainers,
We are writing to report a KCSAN-detected data race vulnerability within the
`e1000` network driver. This bug was found by our custom fuzzing tool,
RacePilot. The race occurs when `e1000_tso()` writes a new `next_to_watch`
tracking array index for a transmission buffer without volatile locking, while
the `e1000_clean_tx_irq()` subroutine locklessly evaluates this field
concurrently. We observed this bug on the Linux kernel version
6.18.0-08691-g2061f18ad76e-dirty.
Call Trace & Context
==================================================================
BUG: KCSAN: data-race in e1000_clean / e1000_xmit_frame
write to 0xffffc9000414e492 of 2 bytes by task 5500 on cpu 0:
e1000_tso drivers/net/ethernet/intel/e1000/e1000_main.c:2752 [inline]
e1000_xmit_frame+0x14fe/0x2a90
drivers/net/ethernet/intel/e1000/e1000_main.c:3226
__netdev_start_xmit include/linux/netdevice.h:5273 [inline]
netdev_start_xmit include/linux/netdevice.h:5282 [inline]
xmit_one net/core/dev.c:3853 [inline]
dev_hard_start_xmit+0xee/0x3a0 net/core/dev.c:3869
...
tcp_write_xmit+0xf64/0x3ee0 net/ipv4/tcp_output.c:3002
tcp_push_one+0x87/0xa0 net/ipv4/tcp_output.c:3199
read to 0xffffc9000414e492 of 2 bytes by interrupt on cpu 1:
e1000_clean_tx_irq drivers/net/ethernet/intel/e1000/e1000_main.c:3871 [inline]
e1000_clean+0x2fb/0x1570 drivers/net/ethernet/intel/e1000/e1000_main.c:3804
__napi_poll+0x5d/0x3f0 net/core/dev.c:7666
napi_poll net/core/dev.c:7729 [inline]
net_rx_action+0x6cc/0x890 net/core/dev.c:7881
handle_softirqs+0xbe/0x290 kernel/softirq.c:622
...
value changed: 0x0083 -> 0x008d
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 UID: 0 PID: 194865 Comm: syz.3.8279 Not tainted
6.18.0-08691-g2061f18ad76e-dirty #44 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
==================================================================
Execution Flow & Code Context
During standard outgoing network transactions over the `e1000` queue layout,
`e1000_tso` maps buffer parameters into descriptions and sets
`buffer_info->next_to_watch`:
```c
// drivers/net/ethernet/intel/e1000/e1000_main.c
static int e1000_tso(struct e1000_adapter *adapter,
struct e1000_tx_ring *tx_ring, struct sk_buff *skb,
__be16 protocol)
{
...
context_desc->cmd_and_length = cpu_to_le32(cmd_length);
buffer_info->time_stamp = jiffies;
buffer_info->next_to_watch = i; // <-- Concurrent 2-byte
lockless write
if (++i == tx_ring->count)
i = 0;
...
}
```
Simultaneously, `e1000_clean_tx_irq()` operates from IRQ/NAPI loops tracking
trailing descriptor completion, directly reading `next_to_watch` from active
bounds before evaluating the hardware completion marker bit:
```c
// drivers/net/ethernet/intel/e1000/e1000_main.c
static bool e1000_clean_tx_irq(struct e1000_adapter *adapter,
struct e1000_tx_ring *tx_ring)
{
...
i = tx_ring->next_to_clean;
eop = tx_ring->buffer_info[i].next_to_watch; // <-- Concurrent 2-byte
lockless read
eop_desc = E1000_TX_DESC(*tx_ring, eop);
while ((eop_desc->upper.data & cpu_to_le32(E1000_TXD_STAT_DD)) &&
...
```
Root Cause Analysis
A KCSAN data race arises because `e1000_clean_tx_irq()` fetches `next_to_watch`
before examining the transmit completion synchronization `E1000_TXD_STAT_DD`
bit asynchronously across independent core cycles. This inherently causes
collision overlaps against background `e1000_xmit_frame` calls queuing buffers
onto identical arrays during transmission pipelines. While logical structures
inherently mask off processing loops if the `DD` bit defaults logically,
unpredictable compiler caching causes unoptimized structure tearing.
Unfortunately, we were unable to generate a reproducer for this bug.
Potential Impact
This data race presents theoretical local memory corruption or networking state
breakdown risks alongside constant KCSAN performance degradation if a compiler
incorrectly infers visibility assumptions across unannotated variables bridging
hardware transmission synchronizations.
Proposed Fix
Implementing `READ_ONCE()` and `WRITE_ONCE()` bounds around `next_to_watch`
resolves the data race logically, ensuring correct load barriers on NAPI
consumers:
```diff
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -2749,7 +2749,7 @@ static int e1000_tso(struct e1000_adapter *adapter,
context_desc->cmd_and_length = cpu_to_le32(cmd_length);
buffer_info->time_stamp = jiffies;
- buffer_info->next_to_watch = i;
+ WRITE_ONCE(buffer_info->next_to_watch, i);
if (++i == tx_ring->count)
i = 0;
@@ -3839,7 +3839,7 @@ static bool e1000_clean_tx_irq(struct e1000_adapter
*adapter,
unsigned int bytes_compl = 0, pkts_compl = 0;
i = tx_ring->next_to_clean;
- eop = tx_ring->buffer_info[i].next_to_watch;
+ eop = READ_ONCE(tx_ring->buffer_info[i].next_to_watch);
eop_desc = E1000_TX_DESC(*tx_ring, eop);
while ((eop_desc->upper.data & cpu_to_le32(E1000_TXD_STAT_DD)) &&
```
*(Note: Similar `WRITE_ONCE` bounds should also be applied to `e1000_tx_map`
and `e1000_tx_csum` respectively)*
We would be highly honored if this could be of any help.
Best regards,
RacePilot Team