Re: [E1000-devel] rx_no_dma_resources - Issue on newer hardware (not on older hardware)

Scott Silverman Thu, 23 Jan 2014 13:05:09 -0800

I now have one of the older (dual Xeon X5670) systems running CentOS6 like
the newer hardware. It remains free of any drops incrementing the
"rx_no_dma_resources" counter. The newer (E5-2670 and E5-2680 v2) hardware
still drops.


Various tuning measures have had varying amounts of success in reducing the
number of drops on the newer hardware (things like limiting RSS to the
number of physical cores on the CPU package connected to the NIC, turning
off ATR, using numactl to move processes closer to interrupts, etc) but
none of them have been necessary on the older "slower" hardware.

All systems (new and old) have their C-states disabled and only use C0 and
C1. turbostat reports that they stay, consistently, at their turbo
frequencies, all right around 3Ghz.

Adjusting the rx-usecs value to 0, disabling interrupt moderation, seems
like it may have reduced the drops a bit, but I can't say that conclusively
yet.




Thanks,

Scott Silverman | IT | Simplex Investments | 312-360-2444
230 S. LaSalle St., Suite 4-100, Chicago, IL 60604


On Thu, Dec 26, 2013 at 10:15 AM, Duyck, Alexander H <
alexander.h.du...@intel.com> wrote:

>  Normally any other issues such as ASPM would show up as Rx missed errors
> without the no_dma_resources error.  This is because ASPM normally affects
> DMA latency, not CPU performance.
>
>
>
> One other thing that occurred to me that you might want to check is the
> interrupt moderation configuration.  This can be controlled via the
> “ethtool –C/-c” interface.  Normally the rx-usecs value is defaulted to 1
> if I recall which is a dynamic interrupt moderation value.  One thing you
> might try is setting it to a static value such as 40us to see if this helps
> to reduce the drops.
>
>
>
> Thanks,
>
>
>
> Alex
>
>
>
> *From:* Scott Silverman [mailto:ssilver...@simplexinvestments.com]
> *Sent:* Tuesday, December 24, 2013 10:09 AM
> *To:* Brandeburg, Jesse
> *Cc:* Duyck, Alexander H; e1000-devel@lists.sourceforge.net
> *Subject:* Re: [E1000-devel] rx_no_dma_resources - Issue on newer
> hardware (not on older hardware)
>
>
>
> I haven't been able to get a system out on the older hardware running
> CentOS6 yet.
>
>
>
> In the meantime I did want to confirm that, according to turbostat (and
> i7z) my cores never leave C0/C1. They also stay at a consistent frequency
> (3.0-3.2Ghz depending on the processor). I am fairly confident that the
> information reported by those tools is accurate and that there are no
> sleep/wakeup issues in terms of CPU power management.
>
>
>
> Are there other sleep/wake issues on the newer hardware I need to be aware
> of, other than the CPU power state? As far as I know, ASPM is also disabled
> (as reported by lspci -vv LnkCtl: ASPM Disabled).
>
>
>
>
>
>
>
>
> Thanks,
>
>
>
> Scott Silverman | IT | Simplex Investments | 312-360-2444
>
> 230 S. LaSalle St., Suite 4-100, Chicago, IL 60604
>
>
>
> On Thu, Dec 19, 2013 at 5:32 PM, Brandeburg, Jesse <
> jesse.brandeb...@intel.com> wrote:
>
> Scott be sure to try running turbostat on both old and new servers as I
> suspect the 50us wake latency of C6 power state may cause drops.
>
> The new kernels enable deeper sleep.
>
> You can also try a bios setting to disable deep sleep states, leave on C1
> only.
>
> There was a program called cpudmalatency.c or something that may be able
> to help you keep system more awake.
>
> --
> Jesse Brandeburg
>
>
>
> On Dec 19, 2013, at 2:57 PM, "Scott Silverman" <
> ssilver...@simplexinvestments.com> wrote:
>
> > Alex,
> >
> > Thanks for the response, I'll attempt to reproduce with a consistent OS
> > release and re-open the discussion at that time.
> >
> >
> >
> >
> >
> >
> > Thanks,
> >
> > Scott Silverman
> >
> >
> > On Thu, Dec 19, 2013 at 4:52 PM, Alexander Duyck <
> > alexander.h.du...@intel.com> wrote:
> >
> >> On 12/19/2013 10:31 AM, Scott Silverman wrote:
> >>> We have three generations of servers running nearly identical software.
> >>> Each subscribes to a variety of multicast groups taking in, on average,
> >>> 200-300Mbps of data.
> >>>
> >>> The oldest generation (2x Xeon X5670, SuperMicro 6016T-NTRF, Intel
> >>> X520-DA2) has no issues handling all the incoming data. (zero
> >>> rx_no_dma_resources)
> >>>
> >>> The middle generation (2x Xeon E5-2670, SuperMicro 6017R-WRF, Intel
> >>> X520-DA2) and the newest generation (2x Xeon E5-2680v2, SuperMicro
> >>> 6017R-WRF, Intel X520-DAs) both have issues handling the incoming data
> >>> (indicated by increasing rx_no_dma_resources counter).
> >>>
> >>> The oldest generation of servers is running CentOS5 on a newer kernel
> >>> (3.4.41), the others are running CentOS6 on the exact same kernel
> >> (3.4.41).
> >>>
> >>> The oldest generation is using ixgbe 3.13.10, the middle generation
> >> 3.13.10
> >>> and the newest are on 3.18.7. All machines are using the
> set_irq_affinity
> >>> script to spread queue interrupts across available cores. All machines
> >> are
> >>> configured with C1 as the maximum C-state and CPU clocks are all steady
> >>> between 3-3.2Ghz depending on the processor model.
> >>>
> >>> On the middle/newer boxes, lowering the number of RSS queues manually
> >> (i.e.
> >>> RSS=8,8) seems to help reduce the amount of dropping, but it does not
> >>> eliminate it.
> >>>
> >>> The ring buffer drops do not seem to correlate with data rates, either.
> >> It
> >>> does not seem that it is an issue of keeping up. In addition, the boxes
> >> are
> >>> not under particularly heavy load. The CPU usage is generally between
> >> 3-5%
> >>> and rarely spikes much higher than 15%. The load average is generally
> >>> around 2.
> >>>
> >>> I am at a loss for what else to try to diagnose and/or fix this. In my
> >>> mind, the newer boxes should have no problem at all keeping up with the
> >>> older ones.
> >>>
> >>> I've attached the output of ethtool -S, one from each generation of
> >> server.
> >>>
> >>>
> >>>
> >>> Thanks,
> >>>
> >>> Scott Silverman
> >>
> >> Scott,
> >>
> >> Have you tried running the CentOS5 w/ newer kernel on any of your newer
> >> servers, or CentOS6 on one of the older ones?  I ask because this would
> >> seem to be the one of the most significant differences between the
> >> servers that are not dropping frames and those that are.  I suspect you
> >> may have something in the CentOS6 configuration that is responsible for
> >> the drops that is not present in the CentOS5 configuration.  We really
> >> need to eliminate any OS based issues before we can really even hope to
> >> start chasing this issue down into the driver and/or device
> configuration.
> >>
> >> Thanks,
> >>
> >> Alex
>
> >
> ------------------------------------------------------------------------------
> > Rapidly troubleshoot problems before they affect your business. Most IT
> > organizations don't have a clear picture of how application performance
> > affects their revenue. With AppDynamics, you get 100% visibility into
> your
> > Java,.NET, & PHP application. Start your 15-day FREE TRIAL of
> AppDynamics Pro!
> >
> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
> > _______________________________________________
> > E1000-devel mailing list
> > E1000-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/e1000-devel
> > To learn more about Intel&#174; Ethernet, visit
> http://communities.intel.com/community/wired
>
>
>

------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk

_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Re: [E1000-devel] rx_no_dma_resources - Issue on newer hardware (not on older hardware)

Reply via email to