On 10/25/2011 12:26 AM, Flavio Leitner wrote: > On Mon, 24 Oct 2011 16:26:28 +0800 > Michael Wang<wang...@linux.vnet.ibm.com> wrote: > >> On 10/21/2011 10:03 PM, Flavio Leitner wrote: >>> On Fri, 21 Oct 2011 14:15:12 +0800 >>> Michael Wang<wang...@linux.vnet.ibm.com> wrote: >>> >>>> On 10/19/2011 08:16 PM, Flavio Leitner wrote: >>>>> On Wed, 19 Oct 2011 12:49:48 +0800 >>>>> wangyun<wang...@linux.vnet.ibm.com> wrote: >>>>> >>>>>> Hi, Flavio >>>>>> >>>>>> I am new to join the community, work on e1000e driver currently, >>>>>> And I found a thing strange in this issue, please check below. >>>>>> >>>>>> Thanks, >>>>>> Michael Wang >>>>>> >>>>>> On 10/18/2011 10:42 PM, Flavio Leitner wrote: >>>>>>> On Mon, 17 Oct 2011 11:48:22 -0700 >>>>>>> Jesse Brandeburg<jesse.brandeb...@intel.com> wrote: >>>>>>> >>>>>>>> On Fri, 14 Oct 2011 10:04:26 -0700 >>>>>>>> Flavio Leitner<f...@redhat.com> wrote: >>>>>>>> >>>>>>>> TDH is probably not moving due to the writeback threshold settings in >>>>>>>> TXDCTL. netperf UDP_RR test is likely a good way to test this. >>>>>>>> >>>>>>> Yeah, makes sense. I haven't heard about new events after had removed >>>>>>> the flag FLAG2_DMA_BURST. Unfortunately, I don't have access to the >>>>>>> exact >>>>>>> same hardware and I haven't reproduced the issue in-house yet with >>>>>>> another >>>>>>> 82571EB. See below about interface statistics from sar. >> Currently, if FLAG2_DMA_BURST setted, the device will pre-fetch the >> tx descriptor only when: >> >> 1. the descriptor device cached is lower then 32. >> 2. The descriptor host prepared is at least one. >> >> I don't think this will cause that issue, but another thing it done is to >> set the device to write-back the processed descriptor only when the >> amount reach 5(or 4). >> >> So may be when the device get a descriptor and processed, but the >> amount not reached 5, so it don't write-back it, but actually already >> transmitted. >> > That could explain the issue and the fact that sometimes the hang > info printed shows empty ring (write-back happened in the middle). > >> But this will happen only when the transmit suddenly stopped for one >> second or more, I don't know whether this is the real traffic situation >> or not. >> > At least for one customer the interface had almost no traffic. > I will go over all the data again checking if this happens every time. > > >> And may be I am wrong about this, but also I think this may be the only >> reason cause this issue. >> > I am seeing this based on the debugging output: > >>>>> This is the full output with debugging patch applied: >>>>> Oct 11 02:03:52 kernel: e1000e 0000:22:00.1: eth7: Detected Hardware Unit >>>>> Hang: >>>>> Oct 11 02:03:52 kernel: TDH<25> >>>>> Oct 11 02:03:52 kernel: TDT<26> >>>>> Oct 11 02:03:52 kernel: next_to_use<26> >>>>> Oct 11 02:03:52 kernel: next_to_clean<25> >>>>> Oct 11 02:03:52 kernel: buffer_info[next_to_clean]: >>>>> Oct 11 02:03:52 kernel: time_stamp<100b2aa22> >>>>> Oct 11 02:03:52 kernel: next_to_watch<25> >>>>> Oct 11 02:03:52 kernel: jiffies<100b2ab25> >>>>> Oct 11 02:03:52 kernel: next_to_watch.status<0> >>>>> Oct 11 02:03:52 kernel: stored_i =<25> >>>>> Oct 11 02:03:52 kernel: stored_first =<25> >>>>> Oct 11 02:03:52 kernel: stamp =<100b2aa22> >>>>> Oct 11 02:03:52 kernel: factor =<fa> >>>>> Oct 11 02:03:52 kernel: last_clean =<100b2aa1a> >>>>> Oct 11 02:03:52 kernel: last_tx =<100b2aa22> >>>>> Oct 11 02:03:52 kernel: count =<0>/<100> > Notice above that buffer_info time_stamp is the same as in > last_tx (last time the xmit function was called), also that > last_clean (last time the clean function was called) is before > that. Therefore, the system sent just one descriptor in about > 1 second confirming your idea. > > >> So have you try to use the Red Hat 6, is this problem still >> exist? >> > Actually, I received few other reports that looks like to be same > issue but with 6.2. As far as I can tell, hardware that was working > just fine started to show it after the kernel upgrade (coincidentally > 5.7 and 6.2 introduces FLAG2_DMA_BURST). However, I haven't heard > anything back since I had provided the instrumented kernel to confirm > to you. I will follow up as soon as I hear something. > > Assuming that your idea is true, the hang detection is broken because > it's possible to have a descriptor apparently stuck that is just missing > the write-back. So, is it possible to set a timer to write-back? If yes, > it could expire and run before the hang detection period expires. Or > perhaps force the write-back to happen before hang detection execution. >
According to code "ew32(TIDV, adapter->tx_int_delay);", I think such timer has been already set, but I don't know if the tx_int_delay is the default value which is 8(units of 1.024 μs). TIDV means if the time expire, it will flush the write-back, enforced. The default value is very less than 1sec, it can not caused this issue. > Customer has a test system reproducing this with 5.7, we can test > patches there if you like. Just let me know. > > thank you! > fbl > May be you can just search macro "E1000_TXDCTL_DMA_BURST_ENABLE" in "drivers/net/e1000e/e1000.h", change it to: #define E1000_TXDCTL_DMA_BURST_ENABLE \ (E1000_TXDCTL_GRAN | /* set descriptor granularity */ \ E1000_TXDCTL_COUNT_DESC | \ (0 << 16) | /* wthresh must be +1 more than desired */\ (1 << 8) | /* hthresh */ \ 0x1f) /* pthresh */ this will do the write-back even only one has been done, if the problem solved, we can think about a good solution. Thanks, Michael Wang ------------------------------------------------------------------------------ The demand for IT networking professionals continues to grow, and the demand for specialized networking skills is growing even more rapidly. Take a complimentary Learning@Cisco Self-Assessment and learn about Cisco certifications, training, and career opportunities. http://p.sf.net/sfu/cisco-dev2dev _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired