** Package changed: kernel-package (Ubuntu) => linux (Ubuntu)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1723482

Title:
  qlcnic firmware hang detected kvm ganeti

Status in linux package in Ubuntu:
  New

Bug description:
  1) Ubuntu release:

  Description:  Ubuntu 16.04.3 LTS
  Release:      16.04

  2) Package version:

  * linux-image-extra-4.4.0-96-generic (4.4.0-96.119)
  * Also with HWE kernel (4.10.x)

  3) What I expect:

  I have a 10G interface (HP NC523SFP 10Gb 2-port) in a HP ProLiant
  DL380p Gen8, BIOS P70 07/01/2015. The interface is configured using
  the module qlcnic and it works with the names ens2f0 and ens2f1. They
  also have VLANs configured.

  I have installed Ganeti software and bridges over those interfaces,
  br-dmz over ens2f0 and br-str over ens2f1.

  Everything should work without connectivity loss.

  4) What happened instead:

  The interface loses the connectivity from time to time, although it
  recovers itself, with the following error:

  Oct 12 18:23:14 mazinger kernel: [107906.678468] qlcnic 0000:07:00.1: Pause 
control frames disabled on all ports
  Oct 12 18:23:14 mazinger kernel: [107906.678470] qlcnic 0000:07:00.0: Pause 
control frames disabled on all ports
  Oct 12 18:23:14 mazinger kernel: [107906.678475] qlcnic 0000:07:00.0: 
firmware hang detected
  Oct 12 18:23:14 mazinger kernel: [107906.678482] qlcnic 0000:07:00.0: Dumping 
hw/fw registers
  Oct 12 18:23:14 mazinger kernel: [107906.678482] PEG_HALT_STATUS1: 
0x40001502, PEG_HALT_STATUS2: 0x3e1f80,
  Oct 12 18:23:14 mazinger kernel: [107906.678482] PEG_NET_0_PC: 0x6d920, 
PEG_NET_1_PC: 0x6d976,
  Oct 12 18:23:14 mazinger kernel: [107906.678482] PEG_NET_2_PC: 0x149, 
PEG_NET_3_PC: 0x6edbe,
  Oct 12 18:23:14 mazinger kernel: [107906.678482] PEG_NET_4_PC: 0x1e2f3
  Oct 12 18:23:14 mazinger kernel: [107906.680107] qlcnic 0000:07:00.1: 
firmware hang detected
  Oct 12 18:23:14 mazinger kernel: [107906.680385] qlcnic 0000:07:00.1: Dumping 
hw/fw registers
  Oct 12 18:23:14 mazinger kernel: [107906.680385] PEG_HALT_STATUS1: 
0x40001502, PEG_HALT_STATUS2: 0x3e1f80,
  Oct 12 18:23:14 mazinger kernel: [107906.680385] PEG_NET_0_PC: 0x6d920, 
PEG_NET_1_PC: 0x6d976,
  Oct 12 18:23:14 mazinger kernel: [107906.680385] PEG_NET_2_PC: 0x149, 
PEG_NET_3_PC: 0x6edbe,
  Oct 12 18:23:14 mazinger kernel: [107906.680385] PEG_NET_4_PC: 0x1e2f3
  Oct 12 18:23:14 mazinger kernel: [107906.695571] br-dmz: port 1(ens2f0.2) 
entered disabled state
  Oct 12 18:23:15 mazinger kernel: [107907.690629] br-str: port 1(ens2f1.10) 
entered disabled state
  Oct 12 18:23:16 mazinger kernel: [107908.706988] qlcnic 0000:07:00.1: 
Detected state change from DEV_NEED_RESET, skipping ack check
  Oct 12 18:23:17 mazinger kernel: [107909.423713] qlcnic 0000:07:00.0 ens2f0: 
Dump data 15044136 bytes captured, dump data address = ffffc900334c3000, 
template header size 36864 bytes, template address = ffffc900193da000
  Oct 12 18:23:21 mazinger kernel: [107912.800338] qlcnic 0000:07:00.0: loading 
firmware from flash
  Oct 12 18:23:27 mazinger kernel: [107919.137580] qlcnic 0000:07:00.0: Driver 
v5.3.63, firmware v4.20.1
  Oct 12 18:23:27 mazinger kernel: [107919.501555] qlcnic 0000:07:00.1: Driver 
v5.3.63, firmware v4.20.1
  Oct 12 18:23:28 mazinger kernel: [107920.425737] qlcnic 0000:07:00.0 ens2f0: 
Rx Context[0] Created, state 0x2
  Oct 12 18:23:28 mazinger kernel: [107920.435780] qlcnic 0000:07:00.0 ens2f0: 
Tx Context[0x8000] Created, state 0x2
  Oct 12 18:23:28 mazinger kernel: [107920.453103] qlcnic 0000:07:00.0 ens2f0: 
Tx Context[0x8008] Created, state 0x2
  Oct 12 18:23:29 mazinger kernel: [107921.598651] qlcnic 0000:07:00.0 ens2f0: 
Tx Context[0x800a] Created, state 0x2
  Oct 12 18:23:29 mazinger kernel: [107921.615752] qlcnic 0000:07:00.0 ens2f0: 
Tx Context[0x800c] Created, state 0x2
  Oct 12 18:23:30 mazinger kernel: [107922.196706] qlcnic 0000:07:00.1 ens2f1: 
Rx Context[1] Created, state 0x2
  Oct 12 18:23:30 mazinger kernel: [107922.406680] qlcnic 0000:07:00.1 ens2f1: 
Tx Context[0x8001] Created, state 0x2
  Oct 12 18:23:30 mazinger kernel: [107922.422646] qlcnic 0000:07:00.1 ens2f1: 
Tx Context[0x8009] Created, state 0x2
  Oct 12 18:23:30 mazinger kernel: [107922.439890] qlcnic 0000:07:00.1 ens2f1: 
Tx Context[0x800b] Created, state 0x2
  Oct 12 18:23:30 mazinger kernel: [107922.456417] qlcnic 0000:07:00.1 ens2f1: 
Tx Context[0x800d] Created, state 0x2
  Oct 12 18:23:31 mazinger kernel: [107923.500128] qlcnic 0000:07:00.0 ens2f0: 
NIC Link is up
  Oct 12 18:23:31 mazinger kernel: [107923.500360] br-dmz: port 1(ens2f0.2) 
entered forwarding state
  Oct 12 18:23:31 mazinger kernel: [107923.500375] br-dmz: port 1(ens2f0.2) 
entered forwarding state
  Oct 12 18:23:31 mazinger kernel: [107923.500680] qlcnic 0000:07:00.1 ens2f1: 
NIC Link is up
  Oct 12 18:23:31 mazinger kernel: [107923.500971] br-str: port 1(ens2f1.10) 
entered forwarding state
  Oct 12 18:23:31 mazinger kernel: [107923.500985] br-str: port 1(ens2f1.10) 
entered forwarding state
  ---------------

  Sometimes it also has kernel errors and need to be rebooted to recover
  the connectivity:

  Oct  9 14:36:41 mazinger kernel: [262273.497512] ------------[ cut here 
]------------
  Oct  9 14:36:41 mazinger kernel: [262273.497821] WARNING: CPU: 6 PID: 0 at 
/build/linux-z2ccW0/linux-4.4.0/net/sched/sch_generic.c:306 
dev_watchdog+0x237/0x240()
  Oct  9 14:36:41 mazinger kernel: [262273.498083] NETDEV WATCHDOG: ens2f0 
(qlcnic): transmit queue 0 timed out
  Oct  9 14:36:41 mazinger kernel: [262273.498579] Modules linked in: joydev 
binfmt_misc hpwdt ipmi_ssif bridge intel_rapl x86_pkg_temp_thermal input_leds 
intel_powerclamp serio_raw sb_edac edac_core lpc_ich 8250_fintek hpilo ioatdma 
shpchp ipmi_si ipmi_msghandler mac_hid kvm_intel kvm irqbypass ib_iser rdma_cm 
iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi 8021q garp mrp stp llc coretemp drbd lru_cache autofs4 
btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx 
xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul 
crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul 
glue_helper ablk_helper qlcnic hid_generic tg3 igb dca hpsa vxlan cryptd usbhid 
ptp psmouse ip6_udp_tunnel pata_acpi hid i2c_algo_bit scsi_transport_sas 
pps_core udp_tunnel wmi fjes
  Oct  9 14:36:41 mazinger kernel: [262273.498651] CPU: 6 PID: 0 Comm: 
swapper/6 Not tainted 4.4.0-96-generic #119-Ubuntu
  Oct  9 14:36:41 mazinger kernel: [262273.498652] Hardware name: HP ProLiant 
DL380p Gen8, BIOS P70 07/01/2015
  Oct  9 14:36:41 mazinger kernel: [262273.498654]  0000000000000286 
fc090740aa4761f7 ffff881fbf783d98 ffffffff813fabd3
  Oct  9 14:36:41 mazinger kernel: [262273.498666]  ffff881fbf783de0 
ffffffff81d715f8 ffff881fbf783dd0 ffffffff810812e2
  Oct  9 14:36:41 mazinger kernel: [262273.498668]  0000000000000000 
ffff881fade31b00 0000000000000006 ffff881fade30000
  Oct  9 14:36:41 mazinger kernel: [262273.498681] Call Trace:
  Oct  9 14:36:41 mazinger kernel: [262273.498683]  <IRQ>  [<ffffffff813fabd3>] 
dump_stack+0x63/0x90
  Oct  9 14:36:41 mazinger kernel: [262273.498691]  [<ffffffff810812e2>] 
warn_slowpath_common+0x82/0xc0
  Oct  9 14:36:41 mazinger kernel: [262273.498693]  [<ffffffff8108137c>] 
warn_slowpath_fmt+0x5c/0x80
  Oct  9 14:36:41 mazinger kernel: [262273.498697]  [<ffffffff8175eca7>] 
dev_watchdog+0x237/0x240
  Oct  9 14:36:41 mazinger kernel: [262273.498700]  [<ffffffff8175ea70>] ? 
qdisc_rcu_free+0x40/0x40
  Oct  9 14:36:41 mazinger kernel: [262273.498705]  [<ffffffff810ed035>] 
call_timer_fn+0x35/0x120
  Oct  9 14:36:41 mazinger kernel: [262273.498708]  [<ffffffff8175ea70>] ? 
qdisc_rcu_free+0x40/0x40
  Oct  9 14:36:41 mazinger kernel: [262273.498711]  [<ffffffff810ed9ea>] 
run_timer_softirq+0x23a/0x2f0
  Oct  9 14:36:41 mazinger kernel: [262273.498714]  [<ffffffff81085dc1>] 
__do_softirq+0x101/0x290
  Oct  9 14:36:41 mazinger kernel: [262273.498717]  [<ffffffff810860c3>] 
irq_exit+0xa3/0xb0
  Oct  9 14:36:41 mazinger kernel: [262273.498721]  [<ffffffff81845d22>] 
smp_apic_timer_interrupt+0x42/0x50
  Oct  9 14:36:41 mazinger kernel: [262273.498724]  [<ffffffff81843fe2>] 
apic_timer_interrupt+0x82/0x90
  Oct  9 14:36:41 mazinger kernel: [262273.498726]  <EOI>  [<ffffffff816d680e>] 
? cpuidle_enter_state+0x10e/0x2b0
  Oct  9 14:36:41 mazinger kernel: [262273.498731]  [<ffffffff816d69e7>] 
cpuidle_enter+0x17/0x20
  Oct  9 14:36:41 mazinger kernel: [262273.498735]  [<ffffffff810c47c2>] 
call_cpuidle+0x32/0x60
  Oct  9 14:36:41 mazinger kernel: [262273.498737]  [<ffffffff816d69c3>] ? 
cpuidle_select+0x13/0x20
  Oct  9 14:36:41 mazinger kernel: [262273.498739]  [<ffffffff810c4a80>] 
cpu_startup_entry+0x290/0x350
  Oct  9 14:36:41 mazinger kernel: [262273.498743]  [<ffffffff810517b4>] 
start_secondary+0x154/0x190
  Oct  9 14:36:41 mazinger kernel: [262273.498749] ---[ end trace 
6388d35f388918bc ]---
  Oct  9 14:36:41 mazinger kernel: [262273.498765] qlcnic 0000:07:00.0 ens2f0: 
rds_ring=0 crb_rcv_producer=3113 producer=3114 num_desc=4096
  Oct  9 14:36:41 mazinger kernel: [262273.498773] qlcnic 0000:07:00.0 ens2f0: 
rds_ring=1 crb_rcv_producer=1023 producer=0 num_desc=1024
  Oct  9 14:36:41 mazinger kernel: [262273.498781] qlcnic 0000:07:00.0 ens2f0: 
sds_ring=0 crb_sts_consumer=659 consumer=659 crb_intr_mask=0 num_desc=4096
  Oct  9 14:36:41 mazinger kernel: [262273.498788] qlcnic 0000:07:00.0 ens2f0: 
sds_ring=1 crb_sts_consumer=2894 consumer=2894 crb_intr_mask=0 num_desc=4096
  Oct  9 14:36:41 mazinger kernel: [262273.498792] qlcnic 0000:07:00.0 ens2f0: 
sds_ring=2 crb_sts_consumer=3092 consumer=3092 crb_intr_mask=0 num_desc=4096
  Oct  9 14:36:41 mazinger kernel: [262273.498796] qlcnic 0000:07:00.0 ens2f0: 
sds_ring=3 crb_sts_consumer=570 consumer=570 crb_intr_mask=0 num_desc=4096
  Oct  9 14:36:41 mazinger kernel: [262273.498798] qlcnic 0000:07:00.0 ens2f0: 
Tx ring=0 Context Id=0x8000
  Oct  9 14:36:41 mazinger kernel: [262273.498800] qlcnic 0000:07:00.0 ens2f0: 
xmit_finished=161917485, xmit_called=161920455, xmit_on=0, xmit_off=2
  Oct  9 14:36:41 mazinger kernel: [262273.498802] qlcnic 0000:07:00.0 ens2f0: 
crb_intr_mask=0
  Oct  9 14:36:41 mazinger kernel: [262273.498805] qlcnic 0000:07:00.0 ens2f0: 
hw_producer=481, sw_producer=481 sw_consumer=491, hw_consumer=491
  Oct  9 14:36:41 mazinger kernel: [262273.498807] qlcnic 0000:07:00.0 ens2f0: 
Total desc=1024, Available desc=10
  Oct  9 14:36:41 mazinger kernel: [262273.498809] qlcnic 0000:07:00.0 ens2f0: 
Tx ring=1 Context Id=0x8008
  Oct  9 14:36:41 mazinger kernel: [262273.498811] qlcnic 0000:07:00.0 ens2f0: 
xmit_finished=152057037, xmit_called=152059997, xmit_on=0, xmit_off=2
  Oct  9 14:36:41 mazinger kernel: [262273.498813] qlcnic 0000:07:00.0 ens2f0: 
crb_intr_mask=0
  Oct  9 14:36:41 mazinger kernel: [262273.498816] qlcnic 0000:07:00.0 ens2f0: 
hw_producer=81, sw_producer=81 sw_consumer=91, hw_consumer=91
  Oct  9 14:36:41 mazinger kernel: [262273.498818] qlcnic 0000:07:00.0 ens2f0: 
Total desc=1024, Available desc=10
  Oct  9 14:36:41 mazinger kernel: [262273.498819] qlcnic 0000:07:00.0 ens2f0: 
Tx ring=2 Context Id=0x800a
  Oct  9 14:36:41 mazinger kernel: [262273.498821] qlcnic 0000:07:00.0 ens2f0: 
xmit_finished=133645903, xmit_called=133648936, xmit_on=0, xmit_off=2
  Oct  9 14:36:41 mazinger kernel: [262273.498824] qlcnic 0000:07:00.0 ens2f0: 
crb_intr_mask=0
  Oct  9 14:36:41 mazinger kernel: [262273.498827] qlcnic 0000:07:00.0 ens2f0: 
hw_producer=572, sw_producer=572 sw_consumer=582, hw_consumer=582
  Oct  9 14:36:41 mazinger kernel: [262273.498828] qlcnic 0000:07:00.0 ens2f0: 
Total desc=1024, Available desc=10
  Oct  9 14:36:41 mazinger kernel: [262273.498830] qlcnic 0000:07:00.0 ens2f0: 
Tx ring=3 Context Id=0x800c
  Oct  9 14:36:41 mazinger kernel: [262273.498836] qlcnic 0000:07:00.0 ens2f0: 
xmit_finished=162932700, xmit_called=162935603, xmit_on=0, xmit_off=2
  Oct  9 14:36:41 mazinger kernel: [262273.498843] qlcnic 0000:07:00.0 ens2f0: 
crb_intr_mask=0
  Oct  9 14:36:41 mazinger kernel: [262273.498850] qlcnic 0000:07:00.0 ens2f0: 
hw_producer=568, sw_producer=568 sw_consumer=578, hw_consumer=578
  Oct  9 14:36:41 mazinger kernel: [262273.498857] qlcnic 0000:07:00.0 ens2f0: 
Total desc=1024, Available desc=10
  Oct  9 14:36:41 mazinger kernel: [262273.498863] qlcnic 0000:07:00.0 ens2f0: 
Tx timeout, reset adapter context.
  Oct  9 14:36:43 mazinger kernel: [262275.251864] qlcnic 0000:07:00.0: CDRP 
command failed: [7]
  Oct  9 14:36:43 mazinger kernel: [262275.252143] qlcnic 0000:07:00.0: Host 
MBX regs(2)
  Oct  9 14:36:43 mazinger kernel: [262275.252146] 00000039 
  Oct  9 14:36:43 mazinger kernel: [262275.252148] 00050032 <6>[262275.252150] 
  Oct  9 14:36:43 mazinger kernel: [262275.252153] qlcnic 0000:07:00.0: FW MBX 
regs(3)
  Oct  9 14:36:43 mazinger kernel: [262275.252155] 00000007 
  Oct  9 14:36:43 mazinger kernel: [262275.252156] 00000000 00000000 
  Oct  9 14:36:43 mazinger kernel: [262275.252158] 
  Oct  9 14:36:43 mazinger kernel: [262275.252166] qlcnic 0000:07:00.0 ens2f0: 
Failed to Delete interrupts 7
  Oct  9 14:36:43 mazinger kernel: [262275.279376] br-dmz: port 1(ens2f0.2) 
entered disabled state
  Oct  9 14:36:43 mazinger kernel: [262275.447095] qlcnic 0000:07:00.0 ens2f0: 
Rx Context[0] Created, state 0x2
  Oct  9 14:36:43 mazinger kernel: [262275.493365] qlcnic 0000:07:00.0 ens2f0: 
Tx Context[0x8000] Created, state 0x2
  Oct  9 14:36:43 mazinger kernel: [262275.509816] qlcnic 0000:07:00.0 ens2f0: 
Tx Context[0x800e] Created, state 0x2
  Oct  9 14:36:43 mazinger kernel: [262275.527651] qlcnic 0000:07:00.0 ens2f0: 
Tx Context[0x8010] Created, state 0x2
  Oct  9 14:36:43 mazinger kernel: [262275.543852] qlcnic 0000:07:00.0 ens2f0: 
Tx Context[0x8012] Created, state 0x2
  Oct  9 14:36:43 mazinger kernel: [262275.545966] qlcnic 0000:07:00.0 ens2f0: 
qlcnic_reset_hw_context: soft reset complete
  -----------

  What I have tried to fix it:

  - I have upgraded the interface firmware to the latest version
  provided by HP:

  # ethtool -i ens2f0
  driver: qlcnic
  version: 5.3.63
  firmware-version: 4.20.1
  expansion-rom-version: 
  bus-info: 0000:07:00.0
  supports-statistics: yes
  supports-test: yes
  supports-eeprom-access: yes
  supports-register-dump: yes
  supports-priv-flags: no

  - I have opened a case with HP. Following their recomendations I have
  upgraded the firmware of the server to the latest version. After
  capturing a AHS (Active Health System) log the have told me there
  isn't a hardware problem and it should be a software issue.

  - I have tried HWE Kernel (version 4.10.x) which comes with a newer
  version of qlcnic module (5.3.65) but it didn't solved the problem.

  - After reading about some problems with TOS and virtual environments,
  I have disabled TOS/GOS and other configuration in the interfaces:

  auto <iface>
  iface <iface> inet manual
      pre-up /sbin/ethtool --offload <iface> gso off tso off sg off gro off

  
  I have found similar problems googling but all of them were solved applying 
one/some of those things. The issue seems to be related to this kind of 
interfaces and using them with virtual environments.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1723482/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to