Re: [E1000-devel] rx_no_dma_resources - Issue on newer hardware (not on older hardware)

Scott Silverman Wed, 05 Feb 2014 07:08:56 -0800

It also seems that when the drops occur, and the counter is incrementing,
it happens at the same time that the ksoftirqd process chews up CPU time.
We see no, or very little, cpu usage by ksoftirqd on the old hardware. I do
not know how to determine what softirq work is getting passed off to
ksoftirqd, or what work it is doing.





Thanks,

Scott Silverman | IT | Simplex Investments | 312-360-2444
230 S. LaSalle St., Suite 4-100, Chicago, IL 60604


On Tue, Feb 4, 2014 at 12:25 PM, Scott Silverman <
ssilver...@simplexinvestments.com> wrote:

> The BIOS on my system (X9DRW-iF) does allow for me to "disable" the Ageing
> Timer Rollover. I may try that on a less important system and report back.
>
> I am aware of the locality of the PCIe on the new systems, and we have
> been able to reduce the severity of the problem by limiting the number of
> RSS queues to the number of available CPU on the package that is connected
> to the PCIe slots and then binding those queues interrupt handlers to the
> appropriate CPUs (details about this earlier in this thread). Even with
> this, we still experienced dropping, albeit a reduced amount. In the case
> of my hardware, all of the available PCIe slots are adjacent to the first
> CPU package. (Confirmed through the manufacturer documentation, as well as
> hwloc).
>
> DCA is interesting, because we do see that the older systems enable DCA
> (from ixgbe messages in dmesg) and the newer systems do not. My
> understanding of this was that the newer hardware, E5-26xx processors
> specifically, use DDIO as a replacement for DCA. Is this correct? As DDIO
> is completely transparaent, I am not aware of any way we can verify or
> otherwise confirm that DDIO is in fact working. I was also unable to
> confirm or deny that I should not see DCA support enabled on a DDIO capable
> platform, so if you could confirm that as well I would appreciate it.
>
> I have attached the output of ethregs for a new and an old system.
>
>
>
>
>
> Thanks,
>
> Scott Silverman | IT | Simplex Investments | 312-360-2444
> 230 S. LaSalle St., Suite 4-100, Chicago, IL 60604
>
>
> On Tue, Feb 4, 2014 at 11:24 AM, Alexander Duyck <
> alexander.h.du...@intel.com> wrote:
>
>> Scott,
>>
>> The fact that the rx_no_dma_resources counter is increment tells us this
>> is an issue with the CPU not being able to keep up. As such I don't
>> believe that modifying the PCIe settings such as the Max Read Size will
>> likely help to improve performance. The issue in such cases is usually
>> due to the memory latency for processing packets.
>>
>> Does your BIOS give you the option to disable Aging Timeout Rollover? It
>> is something you could test, however I do not know what the effect would
>> be as I don't have that option on any systems I have here.
>>
>> One key difference between the Xeon X5670 and the newer Xeons is the
>> fact that the PCIe controller is built into the newer Xeons. This can
>> have a few different side effects. One of which is that you will see
>> better performance on the "local" node, and worse performance on the
>> "remote" node as transaction have to go across QPI. Have you done much
>> work to sort out which CPUs would be local to adapter in terms of which
>> node they are on? Simply changing the slots the adapter is in , or which
>> CPUs you place the interrupts on with the newer Xeons may have a
>> signficant performance impact.
>>
>> Also one thing that might be a difference between the two platforms
>> could be a feature known as DCA. It allows the network adapters to push
>> data directly into the LLC of the CPUs which can provide a significant
>> performance gain when it is working correctly. It might be useful to
>> obtain the ethregs tool from e1000.sf.net and obtain a dump of the
>> register for the device on the old platform and the device on the new
>> platform. With this we could verify if DCA is in use on both platforms
>> and if the DCA tag map for both platforms are valid.
>>
>> Thanks,
>>
>> Alex
>>
>> On 02/04/2014 07:45 AM, Scott Silverman wrote:
>> > I've tried some other tuning options (Increaasing the PCIe Max Read
>> > Size to 4096, for example) but I still haven't found a way to get
>> > westmere (X5670) performance out of my Sandy/Ivy parts. It just seems
>> > wrong to me that where the X5670 has no problem keeping up, the newer,
>> > better, faster E5-2670 and E5-2680v2 just stumble and fall.
>> >
>> > What else can I do to diagnose an increasing rx_no_dma_resources
>> > counter? Is there some reason that an Intel X520-DA2 NIC should have
>> > this problem? Does Intel have a PCIe 3.0 part I should be trying?
>> >
>> >
>> >
>> >
>> > Thanks,
>> >
>> > Scott Silverman | IT | Simplex Investments | 312-360-2444
>> > 230 S. LaSalle St., Suite 4-100, Chicago, IL 60604
>> >
>> >
>> > On Wed, Jan 29, 2014 at 10:34 AM, Scott Silverman
>> > <ssilver...@simplexinvestments.com
>> > <mailto:ssilver...@simplexinvestments.com>> wrote:
>> >
>> >     After a bit more time, it seems that reducing the ATR to 32us has
>> >     not fully removed the performance discrepancy. It has certainly
>> >     greatly reduced it, though. We still see (less frequent)
>> >     rx_no_dma_resources drops on the newer sandy/ivy-bridge hardware
>> >     that we just don't see on westmere boxes.
>> >
>> >     What is the effect of completely disabling "ATR"? Is it dangerous
>> >     to do so? Is there some other setting that is worth looking into
>> >     in terms of making this type of performance more similar to the
>> >     older hardware with regards to PCI-E transfers?
>> >
>> >
>> >
>> >
>> >     Thanks,
>> >
>> >     Scott Silverman | IT | Simplex Investments | 312-360-2444
>> >     <tel:312-360-2444>
>> >     230 S. LaSalle St., Suite 4-100, Chicago, IL 60604
>> >
>> >
>> >     On Mon, Jan 27, 2014 at 10:18 AM, Duyck, Alexander H
>> >     <alexander.h.du...@intel.com <mailto:alexander.h.du...@intel.com>>
>> >     wrote:
>> >
>> >         Scott,
>> >
>> >         I'm not really the person to answer most of these questions as
>> >         my area of expertise is networking, not CPU design. My advice
>> >         at this point would be to look into QPI, the E5 Xeon CPU
>> >         architecture, and the CPU ring bus. As understanding the
>> >         answer to most of these questions would likely require an
>> >         understanding of these topics.
>> >
>> >         Thanks,
>> >
>> >         Alex
>> >
>> >         *From:*Scott Silverman
>> >         [mailto:ssilver...@simplexinvestments.com
>> >         <mailto:ssilver...@simplexinvestments.com>]
>> >         *Sent:* Monday, January 27, 2014 7:26 AM
>> >         *To:* Duyck, Alexander H
>> >         *Cc:* Brandeburg, Jesse; e1000-devel@lists.sourceforge.net
>> >         <mailto:e1000-devel@lists.sourceforge.net>
>> >         *Subject:* Re: [E1000-devel] rx_no_dma_resources - Issue on
>> >         newer hardware (not on older hardware)
>> >
>> >         Alex,
>> >
>> >         That was the extent of the information I was able to gather
>> >         from my system's manual and BIOS help. I still don't really
>> >         understand, though.
>> >
>> >         -Why would there be a deadlock in the first place? My
>> >         understanding is that PCIe has multiple unidrectional links
>> >         used exclusively by each device, not a shared bus like PCI
>> would.
>> >
>> >         -Why would we want to wait at all, why not resolve the
>> >         deadlock immediately? (And why would the newer systems have
>> >         only *longer* options for this compared to older systems with
>> >         2/4/32us options?)
>> >
>> >         -What is the downside of the shorter times? It's clear that
>> >         performance/latency is improved with a shorter duration, so
>> >         why not always use the shorter duration?
>> >
>> >         -Does the fact that we seem to experience a lot of these
>> >         deadlocks (resulting in degradation of NIC performance)
>> >         indicate some kind of a problem, or is that the expected
>> >         behavior of the NIC on a dedicated x8 PCI-E link? Is there
>> >         some way to reduce the number of experienced deadlocks, rather
>> >         than simply shortening the timer for resolving them?
>> >
>> >         It's certainly great that simply changing the setting for this
>> >         option has resolved the performance issue we had. However, it
>> >         is frustrating to not understand why it helps, or what the
>> >         other effects of changing that setting might be.
>> >
>> >
>> >         Thanks,
>> >
>> >         Scott Silverman | IT | Simplex Investments | 312-360-2444
>> >         <tel:312-360-2444>
>> >
>> >         230 S. LaSalle St., Suite 4-100, Chicago, IL 60604
>> >
>> >         On Mon, Jan 27, 2014 at 9:17 AM, Alexander Duyck
>> >         <alexander.h.du...@intel.com
>> >         <mailto:alexander.h.du...@intel.com>> wrote:
>> >
>> >         Scott,
>> >
>> >         The definition for Ageing Timer Rollover should be available
>> >         in the user
>> >         manual for your system. It is basically a mechanism for
>> >         determining how
>> >         long the system should wait before reallocating resources to
>> >         PCIe after
>> >         a resource deadlock occurs.
>> >
>> >         Thanks,
>> >
>> >         Alex
>> >
>> >
>> >         On 01/27/2014 06:50 AM, Scott Silverman wrote:
>> >         > Alex,
>> >         >
>> >         > It certainly seems that adjusting this setting has resolved
>> >         the issue.
>> >         > I have been unable to find out much about what this setting
>> >         really
>> >         > controls (or why the available values vary so wildly between
>> >         v1 and v2
>> >         > of these chips). Can you explain any more about what it
>> >         does, why it
>> >         > helps and how you determined it might be of use in this
>> >         situation?
>> >         >
>> >         > I've all but confirmed that this change has resolved the
>> >         issue on my
>> >         > systems using the E5-2680 v2 chips. I have the same
>> >         motherboard using
>> >         > E5-2670 chips, and have not been able to determine if this
>> >         fix works
>> >         > there as well as they are running an older BIOS revision
>> >         that doesn't
>> >         > expose the option.
>> >         >
>> >         >
>> >         >
>> >         >
>> >         >
>> >         >
>> >         > Thanks,
>> >         >
>> >         > Scott Silverman | IT | Simplex Investments | 312-360-2444
>> >         <tel:312-360-2444>
>> >         > 230 S. LaSalle St., Suite 4-100, Chicago, IL 60604
>> >         >
>> >         >
>> >         > On Thu, Jan 23, 2014 at 4:00 PM, Scott Silverman
>> >         > <ssilver...@simplexinvestments.com
>> >         <mailto:ssilver...@simplexinvestments.com>
>> >
>> >         > <mailto:ssilver...@simplexinvestments.com
>> >         <mailto:ssilver...@simplexinvestments.com>>> wrote:
>> >         >
>> >         > Answered my own question, from
>> >         > the xeon-e5-1600-2600-vol-2-datasheet.pdf where the hex values
>> >         > line up with the options I see. It seems my board (SuperMicro
>> >         > X9DRW-iF) defaults to 0x2 (128us). I will try 0x1 (32us) and
>> >         > report back.
>> >         >
>> >         > In the meantime, I've tried to do some googling to determine
>> >         what
>> >         > this function actually controls, but can't seem to find
>> anything
>> >         > helpful. Can you point me to a resource that describes what
>> this
>> >         > setting controls for my own understanding?
>> >         >
>> >         >
>> >         >
>> >         >
>> >         > Thanks,
>> >         >
>> >         > Scott Silverman | IT | Simplex Investments | 312-360-2444
>> >         <tel:312-360-2444>
>> >
>> >         > <tel:312-360-2444 <tel:312-360-2444>>
>> >         > 230 S. LaSalle St., Suite 4-100, Chicago, IL 60604
>> >         >
>> >         >
>> >
>> >         > On Thu, Jan 23, 2014 at 3:53 PM, Scott Silverman
>> >         > <ssilver...@simplexinvestments.com
>> >         <mailto:ssilver...@simplexinvestments.com>
>> >
>> >         > <mailto:ssilver...@simplexinvestments.com
>> >         <mailto:ssilver...@simplexinvestments.com>>> wrote:
>> >         >
>> >         > I do not have an "Extended ATR" setting but I do have a
>> >         > "Ageing Timer Rollover"(sp) setting.
>> >         >
>> >         > The default for that is 128us with options of: Disabled, 32us,
>> >         > 128us and 512us.
>> >         >
>> >         > According to the "xeon-35-family-spec-update.pdf" from Intel:
>> >         > (page 95)
>> >         > 0 Disabled
>> >         > 1 32us
>> >         > 2 4us
>> >         > 3 2us.
>> >         >
>> >         > As my options don't really match those on the spec, I thought
>> >         > I'd ask what you suggest I try here.
>> >         >
>> >         >
>> >         >
>> >         >
>> >         > Thanks,
>> >         >
>> >         > Scott Silverman | IT | Simplex Investments | 312-360-2444
>> >         <tel:312-360-2444>
>> >
>> >         > <tel:312-360-2444 <tel:312-360-2444>>
>> >         > 230 S. LaSalle St., Suite 4-100, Chicago, IL 60604
>> >         >
>> >         >
>> >
>> >         > On Thu, Jan 23, 2014 at 3:25 PM, Alexander Duyck
>> >         > <alexander.h.du...@intel.com
>> >         <mailto:alexander.h.du...@intel.com>
>> >
>> >         > <mailto:alexander.h.du...@intel.com
>> >         <mailto:alexander.h.du...@intel.com>>> wrote:
>> >         >
>> >         > One other thing you may want to check is your BIOS
>> >         > configuration.
>> >         > Specifically check to see if you have an option to modify
>> >         > a value called
>> >         > "Extended ATR" in your BIOS. It is usually in somewhere
>> >         > with the
>> >         > advanced CPU options. The default value on many systems
>> >         > is 0x3 and we
>> >         > have seen that changing it to 0x1 can sometimes improve
>> >         > the system
>> >         > performance in cases such as this.
>> >         >
>> >         > Thanks,
>> >         >
>> >         > Alex
>> >         >
>> >         > On 01/23/2014 01:03 PM, Scott Silverman wrote:
>> >         > > I now have one of the older (dual Xeon X5670) systems
>> >         > running CentOS6
>> >         > > like the newer hardware. It remains free of any drops
>> >         > incrementing the
>> >         > > "rx_no_dma_resources" counter. The newer (E5-2670 and
>> >         > E5-2680 v2)
>> >         > > hardware still drops.
>> >         > >
>> >         > > Various tuning measures have had varying amounts of
>> >         > success in
>> >         > > reducing the number of drops on the newer hardware
>> >         > (things like
>> >         > > limiting RSS to the number of physical cores on the CPU
>> >         > package
>> >         > > connected to the NIC, turning off ATR, using numactl to
>> >         > move processes
>> >         > > closer to interrupts, etc) but none of them have been
>> >         > necessary on the
>> >         > > older "slower" hardware.
>> >         > >
>> >         > > All systems (new and old) have their C-states disabled
>> >         > and only use C0
>> >         > > and C1. turbostat reports that they stay, consistently,
>> >         > at their turbo
>> >         > > frequencies, all right around 3Ghz.
>> >         > >
>> >         > > Adjusting the rx-usecs value to 0, disabling interrupt
>> >         > moderation,
>> >         > > seems like it may have reduced the drops a bit, but I
>> >         > can't say that
>> >         > > conclusively yet.
>> >         > >
>> >         > >
>> >         > >
>> >         > >
>> >         > > Thanks,
>> >         > >
>> >         > > Scott Silverman | IT | Simplex Investments |
>> >
>> >         > 312-360-2444 <tel:312-360-2444> <tel:312-360-2444
>> >         <tel:312-360-2444>>
>> >
>> >         > > 230 S. LaSalle St., Suite 4-100, Chicago, IL 60604
>> >         > >
>> >         > >
>> >
>> >         > > On Thu, Dec 26, 2013 at 10:15 AM, Duyck, Alexander H
>> >         > > <alexander.h.du...@intel.com
>> >         <mailto:alexander.h.du...@intel.com>
>> >         > <mailto:alexander.h.du...@intel.com
>> >         <mailto:alexander.h.du...@intel.com>>
>> >
>> >         > <mailto:alexander.h.du...@intel.com
>> >         <mailto:alexander.h.du...@intel.com>
>> >
>> >         > <mailto:alexander.h.du...@intel.com
>> >         <mailto:alexander.h.du...@intel.com>>>> wrote:
>> >         > >
>> >         > > Normally any other issues such as ASPM would show up
>> >         > as Rx missed
>> >         > > errors without the no_dma_resources error. This is
>> >         > because ASPM
>> >         > > normally affects DMA latency, not CPU performance.
>> >         > >
>> >         > >
>> >         > >
>> >         > > One other thing that occurred to me that you might
>> >         > want to check
>> >         > > is the interrupt moderation configuration. This can
>> >         > be controlled
>> >         > > via the "ethtool -C/-c" interface. Normally the
>> >         > rx-usecs value is
>> >         > > defaulted to 1 if I recall which is a dynamic
>> >         > interrupt moderation
>> >         > > value. One thing you might try is setting it to a
>> >         > static value
>> >         > > such as 40us to see if this helps to reduce the drops.
>> >         > >
>> >         > >
>> >         > >
>> >         > > Thanks,
>> >         > >
>> >         > >
>> >         > >
>> >         > > Alex
>> >         > >
>> >         > >
>> >         > >
>> >         > > *From:*Scott Silverman
>> >         > [mailto:ssilver...@simplexinvestments.com
>> >         <mailto:ssilver...@simplexinvestments.com>
>> >         > <mailto:ssilver...@simplexinvestments.com
>> >         <mailto:ssilver...@simplexinvestments.com>>
>> >         > > <mailto:ssilver...@simplexinvestments.com
>> >         <mailto:ssilver...@simplexinvestments.com>
>> >         > <mailto:ssilver...@simplexinvestments.com
>> >         <mailto:ssilver...@simplexinvestments.com>>>]
>> >         > > *Sent:* Tuesday, December 24, 2013 10:09 AM
>> >         > > *To:* Brandeburg, Jesse
>> >         > > *Cc:* Duyck, Alexander H;
>> >         > e1000-devel@lists.sourceforge.net
>> >         <mailto:e1000-devel@lists.sourceforge.net>
>> >         > <mailto:e1000-devel@lists.sourceforge.net
>> >         <mailto:e1000-devel@lists.sourceforge.net>>
>> >
>> >         > > <mailto:e1000-devel@lists.sourceforge.net
>> >         <mailto:e1000-devel@lists.sourceforge.net>
>> >
>> >         > <mailto:e1000-devel@lists.sourceforge.net
>> >         <mailto:e1000-devel@lists.sourceforge.net>>>
>> >         > > *Subject:* Re: [E1000-devel] rx_no_dma_resources -
>> >         > Issue on newer
>> >         > > hardware (not on older hardware)
>> >         > >
>> >         > >
>> >         > >
>> >         > > I haven't been able to get a system out on the older
>> >         > hardware
>> >         > > running CentOS6 yet.
>> >         > >
>> >         > >
>> >         > >
>> >         > > In the meantime I did want to confirm that,
>> >         > according to turbostat
>> >         > > (and i7z) my cores never leave C0/C1. They also stay
>> >         > at a
>> >         > > consistent frequency (3.0-3.2Ghz depending on the
>> >         > processor). I am
>> >         > > fairly confident that the information reported by
>> >         > those tools is
>> >         > > accurate and that there are no sleep/wakeup issues
>> >         > in terms of CPU
>> >         > > power management.
>> >         > >
>> >         > >
>> >         > >
>> >         > > Are there other sleep/wake issues on the newer
>> >         > hardware I need to
>> >         > > be aware of, other than the CPU power state? As far
>> >         > as I know,
>> >         > > ASPM is also disabled (as reported by lspci -vv
>> >         > LnkCtl: ASPM
>> >         > > Disabled).
>> >         > >
>> >         > >
>> >         > >
>> >         > >
>> >         > >
>> >         > >
>> >         > >
>> >         > >
>> >         > > Thanks,
>> >         > >
>> >         > >
>> >         > >
>> >         > > Scott Silverman | IT | Simplex Investments |
>> >         > 312-360-2444 <tel:312-360-2444> <tel:312-360-2444>
>> >
>> >         > > <tel:312-360-2444 <tel:312-360-2444> <tel:312-360-2444
>> >         <tel:312-360-2444>>>
>> >
>> >         > >
>> >         > > 230 S. LaSalle St., Suite 4-100, Chicago, IL 60604
>> >         > >
>> >         > >
>> >         > >
>> >         > > On Thu, Dec 19, 2013 at 5:32 PM, Brandeburg, Jesse
>> >         > > <jesse.brandeb...@intel.com
>> >         <mailto:jesse.brandeb...@intel.com>
>> >         > <mailto:jesse.brandeb...@intel.com
>> >         <mailto:jesse.brandeb...@intel.com>>
>> >
>> >         > <mailto:jesse.brandeb...@intel.com
>> >         <mailto:jesse.brandeb...@intel.com>
>> >
>> >         > <mailto:jesse.brandeb...@intel.com
>> >         <mailto:jesse.brandeb...@intel.com>>>>
>> >         > > wrote:
>> >         > >
>> >         > > Scott be sure to try running turbostat on both old
>> >         > and new servers
>> >         > > as I suspect the 50us wake latency of C6 power state
>> >         > may cause drops.
>> >         > >
>> >         > > The new kernels enable deeper sleep.
>> >         > >
>> >         > > You can also try a bios setting to disable deep
>> >         > sleep states,
>> >         > > leave on C1 only.
>> >         > >
>> >         > > There was a program called cpudmalatency.c or
>> >         > something that may
>> >         > > be able to help you keep system more awake.
>> >         > >
>> >         > > --
>> >         > > Jesse Brandeburg
>> >         > >
>> >         > >
>> >         > >
>> >         > > On Dec 19, 2013, at 2:57 PM, "Scott Silverman"
>> >         > > <ssilver...@simplexinvestments.com
>> >         <mailto:ssilver...@simplexinvestments.com>
>> >         > <mailto:ssilver...@simplexinvestments.com
>> >         <mailto:ssilver...@simplexinvestments.com>>
>> >
>> >         > > <mailto:ssilver...@simplexinvestments.com
>> >         <mailto:ssilver...@simplexinvestments.com>
>> >
>> >         > <mailto:ssilver...@simplexinvestments.com
>> >         <mailto:ssilver...@simplexinvestments.com>>>> wrote:
>> >         > >
>> >         > > > Alex,
>> >         > > >
>> >         > > > Thanks for the response, I'll attempt to reproduce
>> >         > with a
>> >         > > consistent OS
>> >         > > > release and re-open the discussion at that time.
>> >         > > >
>> >         > > >
>> >         > > >
>> >         > > >
>> >         > > >
>> >         > > >
>> >         > > > Thanks,
>> >         > > >
>> >         > > > Scott Silverman
>> >         > > >
>> >         > > >
>> >         > > > On Thu, Dec 19, 2013 at 4:52 PM, Alexander Duyck <
>> >         > > > alexander.h.du...@intel.com
>> >         <mailto:alexander.h.du...@intel.com>
>> >         > <mailto:alexander.h.du...@intel.com
>> >         <mailto:alexander.h.du...@intel.com>>
>> >
>> >         > > <mailto:alexander.h.du...@intel.com
>> >         <mailto:alexander.h.du...@intel.com>
>> >
>> >         > <mailto:alexander.h.du...@intel.com
>> >         <mailto:alexander.h.du...@intel.com>>>> wrote:
>> >         > > >
>> >         > > >> On 12/19/2013 10:31 AM, Scott Silverman wrote:
>> >         > > >>> We have three generations of servers running
>> >         > nearly identical
>> >         > > software.
>> >         > > >>> Each subscribes to a variety of multicast groups
>> >         > taking in, on
>> >         > > average,
>> >         > > >>> 200-300Mbps of data.
>> >         > > >>>
>> >         > > >>> The oldest generation (2x Xeon X5670, SuperMicro
>> >         > 6016T-NTRF, Intel
>> >         > > >>> X520-DA2) has no issues handling all the
>> >         > incoming data. (zero
>> >         > > >>> rx_no_dma_resources)
>> >         > > >>>
>> >         > > >>> The middle generation (2x Xeon E5-2670,
>> >         > SuperMicro 6017R-WRF,
>> >         > > Intel
>> >         > > >>> X520-DA2) and the newest generation (2x Xeon
>> >         > E5-2680v2, SuperMicro
>> >         > > >>> 6017R-WRF, Intel X520-DAs) both have issues
>> >         > handling the
>> >         > > incoming data
>> >         > > >>> (indicated by increasing rx_no_dma_resources
>> >         > counter).
>> >         > > >>>
>> >         > > >>> The oldest generation of servers is running
>> >         > CentOS5 on a newer
>> >         > > kernel
>> >         > > >>> (3.4.41), the others are running CentOS6 on the
>> >         > exact same kernel
>> >         > > >> (3.4.41).
>> >         > > >>>
>> >         > > >>> The oldest generation is using ixgbe 3.13.10,
>> >         > the middle
>> >         > > generation
>> >         > > >> 3.13.10
>> >         > > >>> and the newest are on 3.18.7. All machines are
>> >         > using the
>> >         > > set_irq_affinity
>> >         > > >>> script to spread queue interrupts across
>> >         > available cores. All
>> >         > > machines
>> >         > > >> are
>> >         > > >>> configured with C1 as the maximum C-state and
>> >         > CPU clocks are
>> >         > > all steady
>> >         > > >>> between 3-3.2Ghz depending on the processor model.
>> >         > > >>>
>> >         > > >>> On the middle/newer boxes, lowering the number
>> >         > of RSS queues
>> >         > > manually
>> >         > > >> (i.e.
>> >         > > >>> RSS=8,8) seems to help reduce the amount of
>> >         > dropping, but it
>> >         > > does not
>> >         > > >>> eliminate it.
>> >         > > >>>
>> >         > > >>> The ring buffer drops do not seem to correlate
>> >         > with data
>> >         > > rates, either.
>> >         > > >> It
>> >         > > >>> does not seem that it is an issue of keeping up.
>> >         > In addition,
>> >         > > the boxes
>> >         > > >> are
>> >         > > >>> not under particularly heavy load. The CPU usage
>> >         > is generally
>> >         > > between
>> >         > > >> 3-5%
>> >         > > >>> and rarely spikes much higher than 15%. The load
>> >         > average is
>> >         > > generally
>> >         > > >>> around 2.
>> >         > > >>>
>> >         > > >>> I am at a loss for what else to try to diagnose
>> >         > and/or fix
>> >         > > this. In my
>> >         > > >>> mind, the newer boxes should have no problem at
>> >         > all keeping up
>> >         > > with the
>> >         > > >>> older ones.
>> >         > > >>>
>> >         > > >>> I've attached the output of ethtool -S, one from
>> >         > each
>> >         > > generation of
>> >         > > >> server.
>> >         > > >>>
>> >         > > >>>
>> >         > > >>>
>> >         > > >>> Thanks,
>> >         > > >>>
>> >         > > >>> Scott Silverman
>> >         > > >>
>> >         > > >> Scott,
>> >         > > >>
>> >         > > >> Have you tried running the CentOS5 w/ newer
>> >         > kernel on any of
>> >         > > your newer
>> >         > > >> servers, or CentOS6 on one of the older ones? I
>> >         > ask because
>> >         > > this would
>> >         > > >> seem to be the one of the most significant
>> >         > differences between the
>> >         > > >> servers that are not dropping frames and those
>> >         > that are. I
>> >         > > suspect you
>> >         > > >> may have something in the CentOS6 configuration
>> >         > that is
>> >         > > responsible for
>> >         > > >> the drops that is not present in the CentOS5
>> >         > configuration. We
>> >         > > really
>> >         > > >> need to eliminate any OS based issues before we
>> >         > can really even
>> >         > > hope to
>> >         > > >> start chasing this issue down into the driver
>> >         > and/or device
>> >         > > configuration.
>> >         > > >>
>> >         > > >> Thanks,
>> >         > > >>
>> >         > > >> Alex
>> >         > >
>> >         > > >
>> >         > >
>> >         >
>> >
>> ------------------------------------------------------------------------------
>> >         > > > Rapidly troubleshoot problems before they affect
>> >         > your business.
>> >         > > Most IT
>> >         > > > organizations don't have a clear picture of how
>> >         > application
>> >         > > performance
>> >         > > > affects their revenue. With AppDynamics, you get
>> >         > 100% visibility
>> >         > > into your
>> >         > > > Java,.NET, & PHP application. Start your 15-day
>> >         > FREE TRIAL of
>> >         > > AppDynamics Pro!
>> >         > > >
>> >         > >
>> >         >
>> >
>> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
>> >         > > > _______________________________________________
>> >         > > > E1000-devel mailing list
>> >         > > > E1000-devel@lists.sourceforge.net
>> >         <mailto:E1000-devel@lists.sourceforge.net>
>> >         > <mailto:E1000-devel@lists.sourceforge.net
>> >         <mailto:E1000-devel@lists.sourceforge.net>>
>> >
>> >         > > <mailto:E1000-devel@lists.sourceforge.net
>> >         <mailto:E1000-devel@lists.sourceforge.net>
>> >
>> >         > <mailto:E1000-devel@lists.sourceforge.net
>> >         <mailto:E1000-devel@lists.sourceforge.net>>>
>> >         > > >
>> >         > https://lists.sourceforge.net/lists/listinfo/e1000-devel
>> >         > > > To learn more about Intel&#174; Ethernet, visit
>> >         > > http://communities.intel.com/community/wired
>> >         > >
>> >         > >
>> >         > >
>> >         > >
>> >         >
>> >         >
>> >         >
>> >         >
>> >
>> >
>> >
>>
>>
>

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk

_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Re: [E1000-devel] rx_no_dma_resources - Issue on newer hardware (not on older hardware)

Reply via email to