On Sat, Sep 24, 2016 at 4:40 PM, Michał Purzyński
<michalpurzyns...@gmail.com> wrote:
> Thank for you being persistent with answers.
>
> So right after sending previous email I noticed that I left over some
> careless IRQ assignments after experimenting with IRQ and process CPU
> affinity. Both cards were hitting the same core, which (for second card) was
> on a different NUMA node, plus that core was saturated.
>
> Result was around 38% packets lost, calculated by comparing packets received
> with rx_missed. It's interesting that no other counter was increasing.
>
> Right now I have moved card 0 to core 0 and card 1 to first core from the
> second CPU.
>
> Now the rx_missed is around 6-7% for each card. Still way too much.
>
> I send a total of 8-11Gbit/sec to both cars total, so each receives around
> half of that. Packet rate is 1.2Mpps top (also total). All kinds of packet
> sizes.

So if you are doing packet analysis I assume you don't need LRO or
GRO.  If not you may want to look into disabling them via "ethtool
-K".  I know RSC can sometimes cause packet drops due to aggregating a
number of frames before finally submitting them to the device.
Although that usually required ASPM to be enabled as well.

> I'll lower rings to 512 as the next step. Good to know about card's
> limitations.
>
> Given that InterruptThrottleRate has to be given in a 'number of interrupts
> / second' what would you recommend I set it to, for a start at least? I have
> a 2.6Ghz Xeon E5 v3.

So I would recommend a value no less than 12500 for
InterruptThrottleRate.  Assuming a reasonable packet rate that should
give you a decent trade of in terms of performance versus latency.

> I'll buy a pair of X710 for a test as well. It will be an interesting
> comparison. Who knows, maybe the RSS implementation and MQ there is good
> enough for IDS to be used.
>
> Fortunately I don't run it inline, this server receives a copy of traffic.

I'm not sure if it will get you much more throughput or not.  I still
find it odd that your dropping packets even though the device isn't
complaining about not having ring buffer resources.  Usually that
points to a bottleneck somewhere in the PCIe bus.  You might want to
double check and verify that the devices are connected directly to the
root complex and not some secondary bus on a PCIe switch that is
actually downgrading the link between the device and the CPU socket.

> On Sat, Sep 24, 2016 at 3:20 AM, Alexander Duyck <alexander.du...@gmail.com>
> wrote:
>>
>> Well as a general rule anything over about 80usecs for
>> InterruptThrottleRate is a waste.  One advantage to reducing the
>> interrupt throttle rate is you can reduce the ring size and you might
>> see a slight performance improvement.  One problem with using 4096
>> descriptors is that it greatly increases the cache footprint and leads
>> to more buffer-bloat and cache thrash as you have to evict old
>> descriptors to pull in new ones.  I'm also sure if you are doing an
>> intrusion detection system (I'm assuming that is what IDS is in
>> reference to), then the users would appreciate it if you didn't add up
>> to a half dozen extra milliseconds of latency to their network (worst
>> case with an elephant flow of 1514 byte frames).
>>
>> What size packets is it you are working with?  One limitation of the
>> 82599 is that it can only handle an upper limit of somewhere around
>> 12Mpps if you are using something like 6 queues, and only a little
>> over 2 for a single queue.  If you exceed 12Mpps then the part will
>> start reporting rx_missed because the PCIe overhead for moving 64 byte
>> packets is great enough that it actually causes us to exceed the
>> limits of the x8 gen2 link.  If the memcpy is what I think it is then
>> it allows us to avoid having to do two different atomic operations
>> that would have been more expensive otherwise.
>>
>> On Fri, Sep 23, 2016 at 12:46 PM, Michał Purzyński
>> <michalpurzyns...@gmail.com> wrote:
>> > Here's what I did
>> >
>> > ethtool -A p1p1 rx off tx off
>> > ethtool -A p3p1 rx off tx off
>> >
>> > Both ethtool -a <interface> and Arista that's pumping data show that
>> > RX/TX
>> > pause are disabled.
>> >
>> > I have two cards, each connected to a separate NUMA node, threads
>> > pinned,
>> > etc.
>> >
>> > One non-standard thing is that I use single queue only, because any form
>> > of
>> > multiqueue leads to packet reordering and confuses IDS. An issue that's
>> > been
>> > hidden for a while in the NSM community.
>> >
>> > driver (from sourceforge) was loaded with MQ=0 DCA=2 RSS=1 VMDQ=0
>> > InterruptThrottleRate=956 FCoE=0 LRO=0 vxvlan_rx=0 (each option's value
>> > given enogh times so it applies to all cards in this system).
>> >
>> > I could see the same issue sending traffic to just one card.
>> >
>> > Of course a single core is swamped with ACK-ing hardware IRQ and then
>> > doing
>> > softIRQ (which seems to be mostly memcpy?). But then again, I don't see
>> > errors about lacking buffers (I run with 4096 descriptors).
>> >
>> >
>> > On Fri, Sep 23, 2016 at 9:22 PM, Alexander Duyck
>> > <alexander.du...@gmail.com>
>> > wrote:
>> >>
>> >> When you say you disabled flow control did you disable it on the
>> >> interface that is dropping packets or the other end?  You might try
>> >> explicitly disabling it on the interface that is dropping packets,
>> >> that in turn should enable per-queue drop instead of putting
>> >> back-pressure onto the Rx FIFO.
>> >>
>> >> With flow control disabled on the local port you should see
>> >> rx_no_dma_resources start incrementing if the issue is that one of the
>> >> Rx rings is not keeping up.
>> >>
>> >> - Alex
>> >>
>> >> On Fri, Sep 23, 2016 at 11:09 AM, Michał Purzyński
>> >> <michalpurzyns...@gmail.com> wrote:
>> >> > xoff was increasing so I disabled flow control.
>> >> >
>> >> > That's a HP DL360 Gen9 and lspci -vvv tells me cards are connected to
>> >> > x8
>> >> > link, speed is 5GT/s and ASPM is disabled.
>> >> >
>> >> > Other error counters are still zero. When I compared rx_packets and
>> >> > rx_missed_errors it looks like a 38% (!!) packets are getting lost.
>> >> >
>> >> > Unfortunately HP documentation is a scam and they actively avoid
>> >> > publishing
>> >> > motherboard layout.
>> >> >
>> >> > Any other place I could look for hints?
>> >> >
>> >> >
>> >> > On Fri, Sep 23, 2016 at 7:01 PM, Alexander Duyck
>> >> > <alexander.du...@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> On Fri, Sep 23, 2016 at 1:10 AM, Michał Purzyński
>> >> >> <michalpurzyns...@gmail.com> wrote:
>> >> >> > Hello.
>> >> >> >
>> >> >> > On my IDS workload with af_packet I can see rx_missed_errors
>> >> >> > growing
>> >> >> > while
>> >> >> > rx_no_buffer_count does not. Basically every other kind of rx_
>> >> >> > error
>> >> >> > counter is 0, including rx_no_dma_resources. It's an 82599 based
>> >> >> > card.
>> >> >> >
>> >> >> > I don't know what to think about that. I went through ixgbe source
>> >> >> > code
>> >> >> > and
>> >> >> > the 82599 datasheet and seems like rx_missed_error means a new
>> >> >> > packet
>> >> >> > overwrote something already in the packet buffer (FIFO queue on
>> >> >> > the
>> >> >> > card)
>> >> >> > because there was no more space in it.
>> >> >> >
>> >> >> > Now, that would happen if there is no place to DMA packets into -
>> >> >> > but
>> >> >> > that
>> >> >> > counter does not grow.
>> >> >> >
>> >> >> > Could you point me to where should I be looking for a problem?
>> >> >> >
>> >> >> > --
>> >> >> > Michal Purzynski
>> >> >>
>> >> >> The Rx missed count will increment if you are not able to receive a
>> >> >> packet because the Rx FIFO is full.  If you are not seeing any
>> >> >> rx_no_dma_resources problems it might indicate that the problem is
>> >> >> not
>> >> >> with providing the DMA resources, but a problem on the bus itself.
>> >> >> You might want to double check the slot the device is connected to
>> >> >> in
>> >> >> order to guarantee that there is a x8 link that supports 5GT/s all
>> >> >> the
>> >> >> way through to the root complex.
>> >> >>
>> >> >> - Alex
>> >> >
>> >> >
>> >
>> >
>
>

------------------------------------------------------------------------------
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to