Dear maintainers,

We have a xen virtualization environment, with 6 nearly identical nodes, Supermicro X8DTU boards.

We run debian stretch on them, the xen hypervisor and linux kernel is from debian stretch, latest at the time of writing.

Unfortunately, we are facing an issue where randomly our igb devices stop working, with the error message:

NETDEV WATCHDOG: enp1s0f0 (igb): transmit queue 0 timed out

And while the driver tries to recover/reset the adapter, it does not succeed. Shutting down the interface and then bringing it back even does not help, a reboot is required to restore normal operation.

The servers are connected to our switch with two interfaces, the problem happens randomly on either one.

We have tried to disable msi interrupts, but that did not help.

Unfortunately, we cannot reproduce the problem, I mean it happens randomly, frequently, but we cannot explicitly trigger it. It did happen on nearly all our nodes, so I assume it is not a hardware problem.

Our kernel/xen versions:

# uname -a
Linux node-3.cloud-b.dravanet.net 4.9.0-5-amd64 #1 SMP Debian 4.9.65-3+deb9u2 (2018-01-04) x86_64 GNU/Linux
# xl info
host                   : x
release                : 4.9.0-5-amd64
version                : #1 SMP Debian 4.9.65-3+deb9u2 (2018-01-04)
machine                : x86_64
nr_cpus                : 8
max_cpu_id             : 23
nr_nodes               : 2
cores_per_socket       : 4
threads_per_core       : 1
cpu_mhz                : 3066
hw_caps : b7ebfbff:029ee3ff:2c100800:00000001:00000000:00000000:00000000:00000100
virt_caps              : hvm hvm_directio
total_memory           : 196599
free_memory            : 94364
sharing_freed_memory   : 0
sharing_used_memory    : 0
outstanding_claims     : 0
free_cpus              : 0
xen_major              : 4
xen_minor              : 8
xen_extra              : .3-pre
xen_version            : 4.8.3-pre
xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64
xen_scheduler          : credit
xen_pagesize           : 4096
platform_params        : virt_start=0xffff800000000000
xen_changeset          :
xen_commandline        : placeholder dom0_mem=4096M gnttab_max_frames=256
cc_compiler            : gcc (Debian 6.3.0-18) 6.3.0 20170516
cc_compile_by          : ijackson
cc_compile_domain      : chiark.greenend.org.uk
cc_compile_date        : Sat Nov 25 11:30:34 UTC 2017
build_id               : 23ac95af74d2e3f84c90068ae674c34e764649e7
xend_config_format     : 4

What else could we try to resolve this issue?

Thanks in advance,

Kojedzinszky Richárd
Euronet Magyarorszag Informatika Zrt.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to