Dear e1000-devel,

I'm wondering what kernel versions people are happily using in
production with the ixgbe driver?

I'm having network stability and performance issues with a 2.6.32-131
modified Red Hat el6 on a quad core Xeon Jasper Forest cpu.  My nic is
X520/82599 dual port.

I wonder if this could be an ixgbe or ioatdma problem.
Ixgbe is not mentioned in my stack traces.  Hoping for advice.

I could try a later kernel, especially one recommended by a
happy ixgbe user.

Any comment is much appreciated.

Here's what I see. (just one cpu for brevity). This has been reported when 
using an old version of
ixgbe as well as 3.9.15-NAPI.

ioatdma 0000:00:0a.1: Channel halted, chanerr = 2
ioatdma 0000:00:0a.1: Channel halted, chanerr = 2
ioatdma 0000:00:0a.1: Channel halted, chanerr = 2
ioatdma 0000:00:0a.1: Channel halted, chanerr = 2
ioatdma 0000:00:0a.1: Channel halted, chanerr = 2
ioatdma 0000:00:0a.1: ioat2_timer_event: Channel halted (2)
BUG: scheduling while atomic: process_name/6888/0x10000301
Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler sunrpc tcp_htcp sr_mod 
cdrom raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy 
async_tx dm_mod ses enclosure sg i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support 
e1000e ioatdma ixgbe(U) dca pm8001(U) libsas scsi_transport_sas ext3 jbd 
mbcache sd_mod crc_t10dif usb_storage pata_acpi ata_generic ata_piix [last 
unloaded: scsi_wait_scan]
Pid: 6888, comm: process_name Not tainted 2.6.32-foo-0 #7
Call Trace:
 <IRQ>  [<ffffffff8104dab6>] ? __schedule_bug+0x66/0x70
 [<ffffffff81477502>] ? thread_return+0x5db/0x779
 [<ffffffff8104f05d>] ? scheduler_tick+0xdd/0x280
 [<ffffffff810128e9>] ? read_tsc+0x9/0x20
 [<ffffffff81090d03>] ? ktime_get+0x63/0xe0
 [<ffffffff81029a2d>] ? lapic_next_event+0x1d/0x30
 [<ffffffffa01c558c>] ? ioat2_timer_event+0x25c/0x270 [ioatdma]
 [<ffffffff8105748a>] ? __cond_resched+0x2a/0x40
 [<ffffffffa01c558c>] ? ioat2_timer_event+0x25c/0x270 [ioatdma]
 [<ffffffff814777f0>] ? _cond_resched+0x30/0x40
 [<ffffffff8100df96>] ? is_valid_bugaddr+0x16/0x40
 [<ffffffff8124e4df>] ? report_bug+0x1f/0xc0
 [<ffffffff8100f2af>] ? die+0x7f/0x90
 [<ffffffff8147a184>] ? do_trap+0xc4/0x160
 [<ffffffffa01c5330>] ? ioat2_timer_event+0x0/0x270 [ioatdma]
 [<ffffffffa01c5330>] ? ioat2_timer_event+0x0/0x270 [ioatdma]
 [<ffffffff8100ce55>] ? do_invalid_op+0x95/0xb0
 [<ffffffffa01c558c>] ? ioat2_timer_event+0x25c/0x270 [ioatdma]
 [<ffffffff8105ff11>] ? vprintk+0x1d1/0x4f0
 [<ffffffff81028e89>] ? native_send_call_func_single_ipi+0x39/0x40
 [<ffffffff8109c081>] ? generic_exec_single+0xb1/0xc0
 [<ffffffff8100befb>] ? invalid_op+0x1b/0x20
 [<ffffffffa01c5330>] ? ioat2_timer_event+0x0/0x270 [ioatdma]
 [<ffffffffa01c558c>] ? ioat2_timer_event+0x25c/0x270 [ioatdma]
 [<ffffffffa01c5579>] ? ioat2_timer_event+0x249/0x270 [ioatdma]
 [<ffffffff810128e9>] ? read_tsc+0x9/0x20
 [<ffffffff81071ea7>] ? run_timer_softirq+0x197/0x340
 [<ffffffff810676a1>] ? __do_softirq+0xc1/0x1d0
 [<ffffffff8100c26c>] ? call_softirq+0x1c/0x30
 <EOI>  [<ffffffff8100dea5>] ? do_softirq+0x65/0xa0
 [<ffffffff81067fe8>] ? local_bh_enable_ip+0x98/0xa0
 [<ffffffff814798fb>] ? _spin_unlock_bh+0x1b/0x20
 [<ffffffffa01c486f>] ? ioat2_cleanup_tasklet+0x8f/0xa0 [ioatdma]
 [<ffffffffa01c3743>] ? ioat2_is_complete+0x83/0xd0 [ioatdma]
 [<ffffffff8141c38f>] ? tcp_recvmsg+0x75f/0xe90
 [<ffffffff81476f75>] ? thread_return+0x4e/0x779
 [<ffffffff8143c55c>] ? inet_recvmsg+0x5c/0x90
 [<ffffffff813d53b3>] ? sock_recvmsg+0x133/0x160
 [<ffffffff81086100>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8109810e>] ? futex_wake+0x10e/0x120
 [<ffffffff8109a071>] ? do_futex+0x121/0xb00
 [<ffffffff8104ed13>] ? perf_event_task_sched_out+0x33/0x80
 [<ffffffff81168779>] ? fget_light+0x9/0x90
 [<ffffffff813d570e>] ? sys_recvfrom+0xee/0x180
 [<ffffffff810097ac>] ? __switch_to+0x1ac/0x320
 [<ffffffff81476f75>] ? thread_return+0x4e/0x779
 [<ffffffff8109aacb>] ? sys_futex+0x7b/0x170
 [<ffffffff8100c5d5>] ? math_state_restore+0x45/0x60
 [<ffffffff8100b132>] ? system_call_fastpath+0x16/0x1b
------------[ cut here ]------------
kernel BUG at drivers/dma/ioat/dma_v2.c:315!

In my sources that line is in ioat2_timer_event and it looks like it
thinks a setup problem happened elsewhere.

/* when halted due to errors check for channel
* programming errors before advancing the completion state
*/
if (is_ioat_halted(status)) {
u32 chanerr;

chanerr = readl(chan->reg_base + IOAT_CHANERR_OFFSET);
dev_err(to_dev(chan), "%s: Channel halted (%x)\n",
__func__, chanerr);
if (test_bit(IOAT_RUN, &chan->state))
BUG_ON(is_ioat_bug(chanerr));
else /* we never got off the ground */
return;
}

Thanks much,


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to