Normally any other issues such as ASPM would show up as Rx missed errors
without the no_dma_resources error. This is because ASPM normally affects DMA
latency, not CPU performance.
One other thing that occurred to me that you might want to check is the
interrupt moderation configuration. This can be controlled via the "ethtool
-C/-c" interface. Normally the rx-usecs value is defaulted to 1 if I recall
which is a dynamic interrupt moderation value. One thing you might try is
setting it to a static value such as 40us to see if this helps to reduce the
drops.
Thanks,
Alex
From: Scott Silverman [mailto:ssilver...@simplexinvestments.com]
Sent: Tuesday, December 24, 2013 10:09 AM
To: Brandeburg, Jesse
Cc: Duyck, Alexander H; e1000-devel@lists.sourceforge.net
Subject: Re: [E1000-devel] rx_no_dma_resources - Issue on newer hardware (not
on older hardware)
I haven't been able to get a system out on the older hardware running CentOS6
yet.
In the meantime I did want to confirm that, according to turbostat (and i7z) my
cores never leave C0/C1. They also stay at a consistent frequency (3.0-3.2Ghz
depending on the processor). I am fairly confident that the information
reported by those tools is accurate and that there are no sleep/wakeup issues
in terms of CPU power management.
Are there other sleep/wake issues on the newer hardware I need to be aware of,
other than the CPU power state? As far as I know, ASPM is also disabled (as
reported by lspci -vv LnkCtl: ASPM Disabled).
Thanks,
Scott Silverman | IT | Simplex Investments | 312-360-2444
230 S. LaSalle St., Suite 4-100, Chicago, IL 60604
On Thu, Dec 19, 2013 at 5:32 PM, Brandeburg, Jesse
<jesse.brandeb...@intel.com<mailto:jesse.brandeb...@intel.com>> wrote:
Scott be sure to try running turbostat on both old and new servers as I suspect
the 50us wake latency of C6 power state may cause drops.
The new kernels enable deeper sleep.
You can also try a bios setting to disable deep sleep states, leave on C1 only.
There was a program called cpudmalatency.c or something that may be able to
help you keep system more awake.
--
Jesse Brandeburg
On Dec 19, 2013, at 2:57 PM, "Scott Silverman"
<ssilver...@simplexinvestments.com<mailto:ssilver...@simplexinvestments.com>>
wrote:
> Alex,
>
> Thanks for the response, I'll attempt to reproduce with a consistent OS
> release and re-open the discussion at that time.
>
>
>
>
>
>
> Thanks,
>
> Scott Silverman
>
>
> On Thu, Dec 19, 2013 at 4:52 PM, Alexander Duyck <
> alexander.h.du...@intel.com<mailto:alexander.h.du...@intel.com>> wrote:
>
>> On 12/19/2013 10:31 AM, Scott Silverman wrote:
>>> We have three generations of servers running nearly identical software.
>>> Each subscribes to a variety of multicast groups taking in, on average,
>>> 200-300Mbps of data.
>>>
>>> The oldest generation (2x Xeon X5670, SuperMicro 6016T-NTRF, Intel
>>> X520-DA2) has no issues handling all the incoming data. (zero
>>> rx_no_dma_resources)
>>>
>>> The middle generation (2x Xeon E5-2670, SuperMicro 6017R-WRF, Intel
>>> X520-DA2) and the newest generation (2x Xeon E5-2680v2, SuperMicro
>>> 6017R-WRF, Intel X520-DAs) both have issues handling the incoming data
>>> (indicated by increasing rx_no_dma_resources counter).
>>>
>>> The oldest generation of servers is running CentOS5 on a newer kernel
>>> (3.4.41), the others are running CentOS6 on the exact same kernel
>> (3.4.41).
>>>
>>> The oldest generation is using ixgbe 3.13.10, the middle generation
>> 3.13.10
>>> and the newest are on 3.18.7. All machines are using the set_irq_affinity
>>> script to spread queue interrupts across available cores. All machines
>> are
>>> configured with C1 as the maximum C-state and CPU clocks are all steady
>>> between 3-3.2Ghz depending on the processor model.
>>>
>>> On the middle/newer boxes, lowering the number of RSS queues manually
>> (i.e.
>>> RSS=8,8) seems to help reduce the amount of dropping, but it does not
>>> eliminate it.
>>>
>>> The ring buffer drops do not seem to correlate with data rates, either.
>> It
>>> does not seem that it is an issue of keeping up. In addition, the boxes
>> are
>>> not under particularly heavy load. The CPU usage is generally between
>> 3-5%
>>> and rarely spikes much higher than 15%. The load average is generally
>>> around 2.
>>>
>>> I am at a loss for what else to try to diagnose and/or fix this. In my
>>> mind, the newer boxes should have no problem at all keeping up with the
>>> older ones.
>>>
>>> I've attached the output of ethtool -S, one from each generation of
>> server.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Scott Silverman
>>
>> Scott,
>>
>> Have you tried running the CentOS5 w/ newer kernel on any of your newer
>> servers, or CentOS6 on one of the older ones? I ask because this would
>> seem to be the one of the most significant differences between the
>> servers that are not dropping frames and those that are. I suspect you
>> may have something in the CentOS6 configuration that is responsible for
>> the drops that is not present in the CentOS5 configuration. We really
>> need to eliminate any OS based issues before we can really even hope to
>> start chasing this issue down into the driver and/or device configuration.
>>
>> Thanks,
>>
>> Alex
> ------------------------------------------------------------------------------
> Rapidly troubleshoot problems before they affect your business. Most IT
> organizations don't have a clear picture of how application performance
> affects their revenue. With AppDynamics, you get 100% visibility into your
> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
> _______________________________________________
> E1000-devel mailing list
> E1000-devel@lists.sourceforge.net<mailto:E1000-devel@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/e1000-devel
> To learn more about Intel® Ethernet, visit
> http://communities.intel.com/community/wired
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit
http://communities.intel.com/community/wired