[Expired for linux (Ubuntu) because there has been no activity for 60
days.]
** Changed in: linux (Ubuntu)
Status: Incomplete => Expired
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1723482
Title:
qlcnic firmware hang detected kvm ganeti
Status in linux package in Ubuntu:
Expired
Bug description:
1) Ubuntu release:
Description: Ubuntu 16.04.3 LTS
Release: 16.04
2) Package version:
* linux-image-extra-4.4.0-96-generic (4.4.0-96.119)
* Also with HWE kernel (4.10.x)
3) What I expect:
I have a 10G interface (HP NC523SFP 10Gb 2-port) in a HP ProLiant
DL380p Gen8, BIOS P70 07/01/2015. The interface is configured using
the module qlcnic and it works with the names ens2f0 and ens2f1. They
also have VLANs configured.
I have installed Ganeti software and bridges over those interfaces,
br-dmz over ens2f0 and br-str over ens2f1.
Everything should work without connectivity loss.
4) What happened instead:
The interface loses the connectivity from time to time, although it
recovers itself, with the following error:
Oct 12 18:23:14 mazinger kernel: [107906.678468] qlcnic 0000:07:00.1: Pause
control frames disabled on all ports
Oct 12 18:23:14 mazinger kernel: [107906.678470] qlcnic 0000:07:00.0: Pause
control frames disabled on all ports
Oct 12 18:23:14 mazinger kernel: [107906.678475] qlcnic 0000:07:00.0:
firmware hang detected
Oct 12 18:23:14 mazinger kernel: [107906.678482] qlcnic 0000:07:00.0: Dumping
hw/fw registers
Oct 12 18:23:14 mazinger kernel: [107906.678482] PEG_HALT_STATUS1:
0x40001502, PEG_HALT_STATUS2: 0x3e1f80,
Oct 12 18:23:14 mazinger kernel: [107906.678482] PEG_NET_0_PC: 0x6d920,
PEG_NET_1_PC: 0x6d976,
Oct 12 18:23:14 mazinger kernel: [107906.678482] PEG_NET_2_PC: 0x149,
PEG_NET_3_PC: 0x6edbe,
Oct 12 18:23:14 mazinger kernel: [107906.678482] PEG_NET_4_PC: 0x1e2f3
Oct 12 18:23:14 mazinger kernel: [107906.680107] qlcnic 0000:07:00.1:
firmware hang detected
Oct 12 18:23:14 mazinger kernel: [107906.680385] qlcnic 0000:07:00.1: Dumping
hw/fw registers
Oct 12 18:23:14 mazinger kernel: [107906.680385] PEG_HALT_STATUS1:
0x40001502, PEG_HALT_STATUS2: 0x3e1f80,
Oct 12 18:23:14 mazinger kernel: [107906.680385] PEG_NET_0_PC: 0x6d920,
PEG_NET_1_PC: 0x6d976,
Oct 12 18:23:14 mazinger kernel: [107906.680385] PEG_NET_2_PC: 0x149,
PEG_NET_3_PC: 0x6edbe,
Oct 12 18:23:14 mazinger kernel: [107906.680385] PEG_NET_4_PC: 0x1e2f3
Oct 12 18:23:14 mazinger kernel: [107906.695571] br-dmz: port 1(ens2f0.2)
entered disabled state
Oct 12 18:23:15 mazinger kernel: [107907.690629] br-str: port 1(ens2f1.10)
entered disabled state
Oct 12 18:23:16 mazinger kernel: [107908.706988] qlcnic 0000:07:00.1:
Detected state change from DEV_NEED_RESET, skipping ack check
Oct 12 18:23:17 mazinger kernel: [107909.423713] qlcnic 0000:07:00.0 ens2f0:
Dump data 15044136 bytes captured, dump data address = ffffc900334c3000,
template header size 36864 bytes, template address = ffffc900193da000
Oct 12 18:23:21 mazinger kernel: [107912.800338] qlcnic 0000:07:00.0: loading
firmware from flash
Oct 12 18:23:27 mazinger kernel: [107919.137580] qlcnic 0000:07:00.0: Driver
v5.3.63, firmware v4.20.1
Oct 12 18:23:27 mazinger kernel: [107919.501555] qlcnic 0000:07:00.1: Driver
v5.3.63, firmware v4.20.1
Oct 12 18:23:28 mazinger kernel: [107920.425737] qlcnic 0000:07:00.0 ens2f0:
Rx Context[0] Created, state 0x2
Oct 12 18:23:28 mazinger kernel: [107920.435780] qlcnic 0000:07:00.0 ens2f0:
Tx Context[0x8000] Created, state 0x2
Oct 12 18:23:28 mazinger kernel: [107920.453103] qlcnic 0000:07:00.0 ens2f0:
Tx Context[0x8008] Created, state 0x2
Oct 12 18:23:29 mazinger kernel: [107921.598651] qlcnic 0000:07:00.0 ens2f0:
Tx Context[0x800a] Created, state 0x2
Oct 12 18:23:29 mazinger kernel: [107921.615752] qlcnic 0000:07:00.0 ens2f0:
Tx Context[0x800c] Created, state 0x2
Oct 12 18:23:30 mazinger kernel: [107922.196706] qlcnic 0000:07:00.1 ens2f1:
Rx Context[1] Created, state 0x2
Oct 12 18:23:30 mazinger kernel: [107922.406680] qlcnic 0000:07:00.1 ens2f1:
Tx Context[0x8001] Created, state 0x2
Oct 12 18:23:30 mazinger kernel: [107922.422646] qlcnic 0000:07:00.1 ens2f1:
Tx Context[0x8009] Created, state 0x2
Oct 12 18:23:30 mazinger kernel: [107922.439890] qlcnic 0000:07:00.1 ens2f1:
Tx Context[0x800b] Created, state 0x2
Oct 12 18:23:30 mazinger kernel: [107922.456417] qlcnic 0000:07:00.1 ens2f1:
Tx Context[0x800d] Created, state 0x2
Oct 12 18:23:31 mazinger kernel: [107923.500128] qlcnic 0000:07:00.0 ens2f0:
NIC Link is up
Oct 12 18:23:31 mazinger kernel: [107923.500360] br-dmz: port 1(ens2f0.2)
entered forwarding state
Oct 12 18:23:31 mazinger kernel: [107923.500375] br-dmz: port 1(ens2f0.2)
entered forwarding state
Oct 12 18:23:31 mazinger kernel: [107923.500680] qlcnic 0000:07:00.1 ens2f1:
NIC Link is up
Oct 12 18:23:31 mazinger kernel: [107923.500971] br-str: port 1(ens2f1.10)
entered forwarding state
Oct 12 18:23:31 mazinger kernel: [107923.500985] br-str: port 1(ens2f1.10)
entered forwarding state
---------------
Sometimes it also has kernel errors and need to be rebooted to recover
the connectivity:
Oct 9 14:36:41 mazinger kernel: [262273.497512] ------------[ cut here
]------------
Oct 9 14:36:41 mazinger kernel: [262273.497821] WARNING: CPU: 6 PID: 0 at
/build/linux-z2ccW0/linux-4.4.0/net/sched/sch_generic.c:306
dev_watchdog+0x237/0x240()
Oct 9 14:36:41 mazinger kernel: [262273.498083] NETDEV WATCHDOG: ens2f0
(qlcnic): transmit queue 0 timed out
Oct 9 14:36:41 mazinger kernel: [262273.498579] Modules linked in: joydev
binfmt_misc hpwdt ipmi_ssif bridge intel_rapl x86_pkg_temp_thermal input_leds
intel_powerclamp serio_raw sb_edac edac_core lpc_ich 8250_fintek hpilo ioatdma
shpchp ipmi_si ipmi_msghandler mac_hid kvm_intel kvm irqbypass ib_iser rdma_cm
iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi
scsi_transport_iscsi 8021q garp mrp stp llc coretemp drbd lru_cache autofs4
btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul
glue_helper ablk_helper qlcnic hid_generic tg3 igb dca hpsa vxlan cryptd usbhid
ptp psmouse ip6_udp_tunnel pata_acpi hid i2c_algo_bit scsi_transport_sas
pps_core udp_tunnel wmi fjes
Oct 9 14:36:41 mazinger kernel: [262273.498651] CPU: 6 PID: 0 Comm:
swapper/6 Not tainted 4.4.0-96-generic #119-Ubuntu
Oct 9 14:36:41 mazinger kernel: [262273.498652] Hardware name: HP ProLiant
DL380p Gen8, BIOS P70 07/01/2015
Oct 9 14:36:41 mazinger kernel: [262273.498654] 0000000000000286
fc090740aa4761f7 ffff881fbf783d98 ffffffff813fabd3
Oct 9 14:36:41 mazinger kernel: [262273.498666] ffff881fbf783de0
ffffffff81d715f8 ffff881fbf783dd0 ffffffff810812e2
Oct 9 14:36:41 mazinger kernel: [262273.498668] 0000000000000000
ffff881fade31b00 0000000000000006 ffff881fade30000
Oct 9 14:36:41 mazinger kernel: [262273.498681] Call Trace:
Oct 9 14:36:41 mazinger kernel: [262273.498683] <IRQ> [<ffffffff813fabd3>]
dump_stack+0x63/0x90
Oct 9 14:36:41 mazinger kernel: [262273.498691] [<ffffffff810812e2>]
warn_slowpath_common+0x82/0xc0
Oct 9 14:36:41 mazinger kernel: [262273.498693] [<ffffffff8108137c>]
warn_slowpath_fmt+0x5c/0x80
Oct 9 14:36:41 mazinger kernel: [262273.498697] [<ffffffff8175eca7>]
dev_watchdog+0x237/0x240
Oct 9 14:36:41 mazinger kernel: [262273.498700] [<ffffffff8175ea70>] ?
qdisc_rcu_free+0x40/0x40
Oct 9 14:36:41 mazinger kernel: [262273.498705] [<ffffffff810ed035>]
call_timer_fn+0x35/0x120
Oct 9 14:36:41 mazinger kernel: [262273.498708] [<ffffffff8175ea70>] ?
qdisc_rcu_free+0x40/0x40
Oct 9 14:36:41 mazinger kernel: [262273.498711] [<ffffffff810ed9ea>]
run_timer_softirq+0x23a/0x2f0
Oct 9 14:36:41 mazinger kernel: [262273.498714] [<ffffffff81085dc1>]
__do_softirq+0x101/0x290
Oct 9 14:36:41 mazinger kernel: [262273.498717] [<ffffffff810860c3>]
irq_exit+0xa3/0xb0
Oct 9 14:36:41 mazinger kernel: [262273.498721] [<ffffffff81845d22>]
smp_apic_timer_interrupt+0x42/0x50
Oct 9 14:36:41 mazinger kernel: [262273.498724] [<ffffffff81843fe2>]
apic_timer_interrupt+0x82/0x90
Oct 9 14:36:41 mazinger kernel: [262273.498726] <EOI> [<ffffffff816d680e>]
? cpuidle_enter_state+0x10e/0x2b0
Oct 9 14:36:41 mazinger kernel: [262273.498731] [<ffffffff816d69e7>]
cpuidle_enter+0x17/0x20
Oct 9 14:36:41 mazinger kernel: [262273.498735] [<ffffffff810c47c2>]
call_cpuidle+0x32/0x60
Oct 9 14:36:41 mazinger kernel: [262273.498737] [<ffffffff816d69c3>] ?
cpuidle_select+0x13/0x20
Oct 9 14:36:41 mazinger kernel: [262273.498739] [<ffffffff810c4a80>]
cpu_startup_entry+0x290/0x350
Oct 9 14:36:41 mazinger kernel: [262273.498743] [<ffffffff810517b4>]
start_secondary+0x154/0x190
Oct 9 14:36:41 mazinger kernel: [262273.498749] ---[ end trace
6388d35f388918bc ]---
Oct 9 14:36:41 mazinger kernel: [262273.498765] qlcnic 0000:07:00.0 ens2f0:
rds_ring=0 crb_rcv_producer=3113 producer=3114 num_desc=4096
Oct 9 14:36:41 mazinger kernel: [262273.498773] qlcnic 0000:07:00.0 ens2f0:
rds_ring=1 crb_rcv_producer=1023 producer=0 num_desc=1024
Oct 9 14:36:41 mazinger kernel: [262273.498781] qlcnic 0000:07:00.0 ens2f0:
sds_ring=0 crb_sts_consumer=659 consumer=659 crb_intr_mask=0 num_desc=4096
Oct 9 14:36:41 mazinger kernel: [262273.498788] qlcnic 0000:07:00.0 ens2f0:
sds_ring=1 crb_sts_consumer=2894 consumer=2894 crb_intr_mask=0 num_desc=4096
Oct 9 14:36:41 mazinger kernel: [262273.498792] qlcnic 0000:07:00.0 ens2f0:
sds_ring=2 crb_sts_consumer=3092 consumer=3092 crb_intr_mask=0 num_desc=4096
Oct 9 14:36:41 mazinger kernel: [262273.498796] qlcnic 0000:07:00.0 ens2f0:
sds_ring=3 crb_sts_consumer=570 consumer=570 crb_intr_mask=0 num_desc=4096
Oct 9 14:36:41 mazinger kernel: [262273.498798] qlcnic 0000:07:00.0 ens2f0:
Tx ring=0 Context Id=0x8000
Oct 9 14:36:41 mazinger kernel: [262273.498800] qlcnic 0000:07:00.0 ens2f0:
xmit_finished=161917485, xmit_called=161920455, xmit_on=0, xmit_off=2
Oct 9 14:36:41 mazinger kernel: [262273.498802] qlcnic 0000:07:00.0 ens2f0:
crb_intr_mask=0
Oct 9 14:36:41 mazinger kernel: [262273.498805] qlcnic 0000:07:00.0 ens2f0:
hw_producer=481, sw_producer=481 sw_consumer=491, hw_consumer=491
Oct 9 14:36:41 mazinger kernel: [262273.498807] qlcnic 0000:07:00.0 ens2f0:
Total desc=1024, Available desc=10
Oct 9 14:36:41 mazinger kernel: [262273.498809] qlcnic 0000:07:00.0 ens2f0:
Tx ring=1 Context Id=0x8008
Oct 9 14:36:41 mazinger kernel: [262273.498811] qlcnic 0000:07:00.0 ens2f0:
xmit_finished=152057037, xmit_called=152059997, xmit_on=0, xmit_off=2
Oct 9 14:36:41 mazinger kernel: [262273.498813] qlcnic 0000:07:00.0 ens2f0:
crb_intr_mask=0
Oct 9 14:36:41 mazinger kernel: [262273.498816] qlcnic 0000:07:00.0 ens2f0:
hw_producer=81, sw_producer=81 sw_consumer=91, hw_consumer=91
Oct 9 14:36:41 mazinger kernel: [262273.498818] qlcnic 0000:07:00.0 ens2f0:
Total desc=1024, Available desc=10
Oct 9 14:36:41 mazinger kernel: [262273.498819] qlcnic 0000:07:00.0 ens2f0:
Tx ring=2 Context Id=0x800a
Oct 9 14:36:41 mazinger kernel: [262273.498821] qlcnic 0000:07:00.0 ens2f0:
xmit_finished=133645903, xmit_called=133648936, xmit_on=0, xmit_off=2
Oct 9 14:36:41 mazinger kernel: [262273.498824] qlcnic 0000:07:00.0 ens2f0:
crb_intr_mask=0
Oct 9 14:36:41 mazinger kernel: [262273.498827] qlcnic 0000:07:00.0 ens2f0:
hw_producer=572, sw_producer=572 sw_consumer=582, hw_consumer=582
Oct 9 14:36:41 mazinger kernel: [262273.498828] qlcnic 0000:07:00.0 ens2f0:
Total desc=1024, Available desc=10
Oct 9 14:36:41 mazinger kernel: [262273.498830] qlcnic 0000:07:00.0 ens2f0:
Tx ring=3 Context Id=0x800c
Oct 9 14:36:41 mazinger kernel: [262273.498836] qlcnic 0000:07:00.0 ens2f0:
xmit_finished=162932700, xmit_called=162935603, xmit_on=0, xmit_off=2
Oct 9 14:36:41 mazinger kernel: [262273.498843] qlcnic 0000:07:00.0 ens2f0:
crb_intr_mask=0
Oct 9 14:36:41 mazinger kernel: [262273.498850] qlcnic 0000:07:00.0 ens2f0:
hw_producer=568, sw_producer=568 sw_consumer=578, hw_consumer=578
Oct 9 14:36:41 mazinger kernel: [262273.498857] qlcnic 0000:07:00.0 ens2f0:
Total desc=1024, Available desc=10
Oct 9 14:36:41 mazinger kernel: [262273.498863] qlcnic 0000:07:00.0 ens2f0:
Tx timeout, reset adapter context.
Oct 9 14:36:43 mazinger kernel: [262275.251864] qlcnic 0000:07:00.0: CDRP
command failed: [7]
Oct 9 14:36:43 mazinger kernel: [262275.252143] qlcnic 0000:07:00.0: Host
MBX regs(2)
Oct 9 14:36:43 mazinger kernel: [262275.252146] 00000039
Oct 9 14:36:43 mazinger kernel: [262275.252148] 00050032 <6>[262275.252150]
Oct 9 14:36:43 mazinger kernel: [262275.252153] qlcnic 0000:07:00.0: FW MBX
regs(3)
Oct 9 14:36:43 mazinger kernel: [262275.252155] 00000007
Oct 9 14:36:43 mazinger kernel: [262275.252156] 00000000 00000000
Oct 9 14:36:43 mazinger kernel: [262275.252158]
Oct 9 14:36:43 mazinger kernel: [262275.252166] qlcnic 0000:07:00.0 ens2f0:
Failed to Delete interrupts 7
Oct 9 14:36:43 mazinger kernel: [262275.279376] br-dmz: port 1(ens2f0.2)
entered disabled state
Oct 9 14:36:43 mazinger kernel: [262275.447095] qlcnic 0000:07:00.0 ens2f0:
Rx Context[0] Created, state 0x2
Oct 9 14:36:43 mazinger kernel: [262275.493365] qlcnic 0000:07:00.0 ens2f0:
Tx Context[0x8000] Created, state 0x2
Oct 9 14:36:43 mazinger kernel: [262275.509816] qlcnic 0000:07:00.0 ens2f0:
Tx Context[0x800e] Created, state 0x2
Oct 9 14:36:43 mazinger kernel: [262275.527651] qlcnic 0000:07:00.0 ens2f0:
Tx Context[0x8010] Created, state 0x2
Oct 9 14:36:43 mazinger kernel: [262275.543852] qlcnic 0000:07:00.0 ens2f0:
Tx Context[0x8012] Created, state 0x2
Oct 9 14:36:43 mazinger kernel: [262275.545966] qlcnic 0000:07:00.0 ens2f0:
qlcnic_reset_hw_context: soft reset complete
-----------
What I have tried to fix it:
- I have upgraded the interface firmware to the latest version
provided by HP:
# ethtool -i ens2f0
driver: qlcnic
version: 5.3.63
firmware-version: 4.20.1
expansion-rom-version:
bus-info: 0000:07:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
- I have opened a case with HP. Following their recomendations I have
upgraded the firmware of the server to the latest version. After
capturing a AHS (Active Health System) log the have told me there
isn't a hardware problem and it should be a software issue.
- I have tried HWE Kernel (version 4.10.x) which comes with a newer
version of qlcnic module (5.3.65) but it didn't solved the problem.
- After reading about some problems with TOS and virtual environments,
I have disabled TOS/GOS and other configuration in the interfaces:
auto <iface>
iface <iface> inet manual
pre-up /sbin/ethtool --offload <iface> gso off tso off sg off gro off
I have found similar problems googling but all of them were solved applying
one/some of those things. The issue seems to be related to this kind of
interfaces and using them with virtual environments.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1723482/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp