I went through the email below and cut out the information I consider
to be noise.  What we are left with is the key bits such as kernel
version (2.6.37), driver version (igb-5.0.6), and other critical
information such as memory and interrupt layouts.

So looking over the Tx hang error reports it looks like for whatever
reason the Tx descriptor is not being written back when the Tx is
being completed.  Without that the Tx path isn't going to make forward
progress in clean-up.  Is this something that happens often, or does
it take a while to reproduce the issue?

Unfortunately the igb driver you are using doesn't include much in the
way of debug information.  If you could switch over to using a newer
kernel with the in-kernel driver you might enable more debug
information via the command "ethtool -s eth0 msglvl 0x2407".  If
nothing else you might look at copying the code out of the in-kernel
igb driver into the driver you are building for your kernel.  Then
when the Tx hang occurs the driver should print the contents of the Tx
buffer info and Tx descriptor rings.  If you could send us a dump of
that we might be able to understand this issue better.

What we are looking for is to try and understand why the DD bit of the
next_to_watch descriptor is not showing as being written back.  So for
example this could be some sort of race or coherency issue that is
resulting in the write-back from the device being overwritten by an
update from the CPU.

- Alex

On Wed, Feb 22, 2017 at 10:26 PM, Fujinaka, Todd
<todd.fujin...@intel.com> wrote:
> Please don’t send this directly to me.
>
> We don’t have any ARM systems in-house and it may be better for you to ask 
> the ARM mailing list.
>
> Todd Fujinaka
> Software Application Engineer
> Networking Division (ND)
> Intel Corporation
> todd.fujin...@intel.com
> (503) 712-4565
>
> From: hechen...@avsolutiontech.com [mailto:hechen...@avsolutiontech.com]
> Sent: Wednesday, February 22, 2017 10:22 PM
> To: Fujinaka, Todd <todd.fujin...@intel.com>
> Subject: Re: RE: [E1000-devel] Linux i350 driver problem
>
> Thank you for your reply.This is the dmseg message ,When an error occurs.
>
> ------------------------------------------Dmseg 
> Start------------------------------------------------------------
> root@dm816x:~# dmesg
> Linux version 2.6.37 (root@avst-linux-server) (gcc version 4.5.3 20110311 
> (prerelease) (GCC) ) #38 Thu May 5 11:48:44 CST 2016
> CPU: ARMv7 Processor [413fc082] revision 2 (ARMv7), cr=10c53c7f
> CPU: VIPT nonaliasing data cache, VIPT aliasing instruction cache
> Machine: ti8168evm
> vram size = 31457280 at 0x0
> ti81xx_reserve: ### Reserved DDR region @8ff00000
> reserved size = 31457280 at 0x0
> FB: Reserving 31457280 bytes SDRAM for VRAM
> Memory policy: ECC disabled, Data cache writeback
> OMAP chip is TI8168 2.1
> On node 0 totalpages: 57600
> free_area_init_node: node 0, pgdat c0494cd4, node_mem_map c04d0000
>   Normal zone: 512 pages used for memmap
>   Normal zone: 0 pages reserved
>   Normal zone: 57088 pages, LIFO batch:15
> pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
> pcpu-alloc: [0] 0
> Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 57088
> Kernel command line: mem=256M console=ttyO2,115200n8 root=/dev/mmcblk0p2 rw 
> rootdelay=3 nolock vram=30M notifyk.vpssm3_sva=0xBEE00000 ddr_mem=1024M
> PID hash table entries: 1024 (order: 0, 4096 bytes)
> Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
> Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
> Memory: 224MB 1MB = 225MB total
> Memory: 223184k/223184k available, 38960k reserved, 0K highmem
> Virtual kernel memory layout:
>     vector  : 0xffff0000 - 0xffff1000   (   4 kB)
>     fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
>     DMA     : 0xffc00000 - 0xffe00000   (   2 MB)
>     vmalloc : 0xd0800000 - 0xf8000000   ( 632 MB)
>     lowmem  : 0xc0000000 - 0xd0000000   ( 256 MB)
>     pkmap   : 0xbfe00000 - 0xc0000000   (   2 MB)
>     modules : 0xbf000000 - 0xbfe00000   (  14 MB)
>       .init : 0xc0008000 - 0xc003f000   ( 220 kB)
>       .text : 0xc003f000 - 0xc0456000   (4188 kB)
>       .data : 0xc0456000 - 0xc0496540   ( 258 kB)
> SLUB: Genslabs=11, HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
> NR_IRQS:375
> IRQ: Found an INTC at 0xfa200000 (revision 5.0) with 128 interrupts
> Total of 128 interrupts on 1 active controller
> GPMC revision 6.0
> Trying to install interrupt handler for IRQ368
> Trying to install interrupt handler for IRQ369
> Trying to install interrupt handler for IRQ370
> Trying to install interrupt handler for IRQ371
> Trying to install interrupt handler for IRQ372
> Trying to install interrupt handler for IRQ373
> Trying to install interrupt handler for IRQ374
> Trying to install type control for IRQ375
> Trying to set irq flags for IRQ375

<...>

> Intel(R) Gigabit Ethernet Network Driver - version 5.0.6
> Copyright (c) 2007-2013 Intel Corporation.
> kl[igb_probe]2525:XXXXXXXXXXXXXXXXenter
> PCI: enabling device 0000:01:00.0 (0140 -> 0142)
> igb 0000:01:00.0: Failed to initialize MSI-X interrupts. Falling back to MSI 
> interrupts.
> igb 0000:01:00.0: Failed to initialize MSI interrupts.  Falling back to 
> legacy interrupts.
> debug:[aba_ex_read_board_mac]250:  aa 50 b6 e6 9b a6
> igb 0000:01:00.0: Intel(R) Gigabit Ethernet Network Connection
> igb 0000:01:00.0: eth0: (PCIe:5.0GT/s:Width x2)
> igb 0000:01:00.0: eth0: MAC: aa:50:b6:e6:9b:a6
> igb 0000:01:00.0: eth0: PBA No: 106300-000
> igb 0000:01:00.0: LRO is disabled
> igb 0000:01:00.0: Using legacy interrupts. 1 rx queue(s), 1 tx queue(s)
> [igb_driver_debug_main]401:reg_val = 7068342 
> @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> kl[igb_probe]2525:XXXXXXXXXXXXXXXXenter
> PCI: enabling device 0000:01:00.1 (0140 -> 0142)
> igb 0000:01:00.1: Failed to initialize MSI-X interrupts. Falling back to MSI 
> interrupts.
> igb 0000:01:00.1: Failed to initialize MSI interrupts.  Falling back to 
> legacy interrupts.
> debug:[aba_ex_read_board_mac]250:  aa 50 b6 e6 9b a6
> ata1: SATA link down (SStatus 0 SControl 300)
> igb 0000:01:00.1: Intel(R) Gigabit Ethernet Network Connection
> igb 0000:01:00.1: eth1: (PCIe:5.0GT/s:Width x2)
> igb 0000:01:00.1: eth1: MAC: aa:50:b6:e6:9b:a7
> igb 0000:01:00.1: eth1: PBA No: 106300-000
> igb 0000:01:00.1: LRO is disabled
> igb 0000:01:00.1: Using legacy interrupts. 1 rx queue(s), 1 tx queue(s)
> [igb_driver_debug_main]401:reg_val = 7068342 
> @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

<...>

> igb 0000:01:00.0: Detected Tx Unit Hang
>   Tx Queue             <0>
>   TDH                  <46>
>   TDT                  <46>
>   next_to_use          <46>
>   next_to_clean        <5c>
> buffer_info[next_to_clean]
>   time_stamp           <1336f7>
>   next_to_watch        <ffc175d0>
>   jiffies              <133788>
>   desc.status          <1568200>
> igb 0000:01:00.0: Detected Tx Unit Hang
>   Tx Queue             <0>
>   TDH                  <46>
>   TDT                  <46>
>   next_to_use          <46>
>   next_to_clean        <5c>
> buffer_info[next_to_clean]
>   time_stamp           <1336f7>
>   next_to_watch        <ffc175d0>
>   jiffies              <133850>
>   desc.status          <1568200>
> igb 0000:01:00.0: Detected Tx Unit Hang
>   Tx Queue             <0>
>   TDH                  <46>
>   TDT                  <46>
>   next_to_use          <46>
>   next_to_clean        <5c>
> buffer_info[next_to_clean]
>   time_stamp           <1336f7>
>   next_to_watch        <ffc175d0>
>   jiffies              <133918>
>   desc.status          <1568200>
> ------------[ cut here ]------------
> WARNING: at net/sched/sch_generic.c:258 dev_watchdog+0x148/0x230()
> NETDEV WATCHDOG: eth0 (igb): transmit queue 0 timed out
> Modules linked in: aur5g8ke_face_lcd avst_digit_audio ti81xxhdmi ti81xxfb 
> vpss osa_kermod syslink
> Backtrace:
> [<c004cfac>] (dump_backtrace+0x0/0x110) from [<c033900c>] 
> (dump_stack+0x18/0x1c)
>  r6:c042b298 r5:00000102 r4:c0457e10 r3:60000113
> [<c0338ff4>] (dump_stack+0x0/0x1c) from [<c0072910>] 
> (warn_slowpath_common+0x54/0x6c)
> [<c00728bc>] (warn_slowpath_common+0x0/0x6c) from [<c00729cc>] 
> (warn_slowpath_fmt+0x38/0x40)
>  r8:c02c78bc r7:00000100 r6:00000000 r5:c04cb59c r4:cdc0c000
> r3:00000009
> [<c0072994>] (warn_slowpath_fmt+0x0/0x40) from [<c02c7a04>] 
> (dev_watchdog+0x148/0x230)
>  r3:cdc0c000 r2:c042b2b0
> [<c02c78bc>] (dev_watchdog+0x0/0x230) from [<c007cc1c>] 
> (run_timer_softirq+0x130/0x1c8)
>  r6:00000100 r5:c0456000 r4:c04b7c40
> [<c007caec>] (run_timer_softirq+0x0/0x1c8) from [<c00777b4>] 
> (__do_softirq+0x84/0x114)
> [<c0077730>] (__do_softirq+0x0/0x114) from [<c0077ba4>] (irq_exit+0x48/0x98)
> [<c0077b5c>] (irq_exit+0x0/0x98) from [<c003f07c>] (asm_do_IRQ+0x7c/0x9c)
> [<c003f000>] (asm_do_IRQ+0x0/0x9c) from [<c033aff4>] (__irq_svc+0x34/0xa0)
> Exception stack(0xc0457f38 to 0xc0457f80)
> 7f20:                                                       fe500000 fe600000
> 7f40: 00000a5d c0496954 00000816 c045a074 c0496608 c045a06c 80000000 413fc082
> 7f60: 0000001f c0457f94 c0457f80 c0457f80 c005a354 c005a360 a0000013 ffffffff
>  r5:fa200000 r4:ffffffff
> [<c005a2e4>] (ti81xx_idle+0x0/0x90) from [<c004a66c>] (cpu_idle+0x50/0x90)
>  r4:c0456000 r3:c005a2e4
> [<c004a61c>] (cpu_idle+0x0/0x90) from [<c032d8dc>] (rest_init+0x60/0x78)
>  r6:c06d0900 r5:c002dd50 r4:c04babbc r3:00000000
> [<c032d87c>] (rest_init+0x0/0x78) from [<c0008c08>] (start_kernel+0x264/0x2b8)
> [<c00089a4>] (start_kernel+0x0/0x2b8) from [<80008048>] (0x80008048)
> ---[ end trace 0fe0781f0790e729 ]---
> igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to