Dear Kernel hackers, I have a machine with a self-built, non-tainted kernel, which exhibits memory corruption as soon as I execute while true; do cat /proc/self/net/dev > /dev/null; done as normal user.
I am running 4.11.3 (almost vanilla, only Gentoo patches in) on mostly standard
hardware (Intel CPU + GPU).
I can also reproduce with 4.9 on that machine.
RAM has already been exchanged. Due to a BIOS bug, the machine needs
"iommu=soft" as kernel parameter, but nothing special otherwise.
The corruption appears in two ways:
Often via:
Corrupted low memory at ffff88000000b000 (b000 phys) = 0016e109
Almost every time visible via:
memtester 15G
(machine has 16 G).
Checking the output of memtester, the values it finds match with the content of
the numbers in:
/proc/self/net/dev
After each boot, it seems the memory page where the corruption appears is
slightly changed, it is usually in the region around 0x94F6000 (physical
address).
I have attached my kernel config, gzipped.
I would be very grateful for any advice on how to debug this further - it does
not really look like a hardware issue to me anymore,
but if it could be, please enlighten me.
Please include me in replies, as I am not subscribed to the list.
In case relevant, my network controller is:
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411
PCI Express Gigabit Ethernet Controller (rev 06)
Thanks and all the best,
Oliver Freyermuth
kernconfig.gz
Description: application/gzip

