Hi Don, Below is snippet of the full log... How to know it's only go into 2 queues? I saw more than 2 queues has similar packets number... Can you explain more?
If it is two queues, would that imply 2 cores will handle 2 flows, right? But, from watch -d -n1 cat /proc/interrupts, I can see interrupt rate increase in the same order on those cores handling ethernet interrupts. About our traffic, basically same 34 Mbps stream sent into 240 multicast address (225.82.10.0 - 225.82.10.119, 225.82.11.0 - 225.82.11.119). Receiver opens up 240 socket to pull data out,check size, then toss it for test purpose. Test application can do multiple threads as command line input. Each thread will handle 240 / N connections, N is thread number. I don't see much different in tern of behavior. Thanks! Hank _packets: 1105903 rx_queue_0_bytes: 1501816274 rx_queue_0_bp_poll_yield: 0 rx_queue_0_bp_misses: 0 rx_queue_0_bp_cleaned: 0 rx_queue_1_packets: 1108639 rx_queue_1_bytes: 1505531762 rx_queue_1_bp_poll_yield: 0 rx_queue_1_bp_misses: 0 rx_queue_1_bp_cleaned: 0 rx_queue_2_packets: 0 rx_queue_2_bytes: 0 rx_queue_2_bp_poll_yield: 0 rx_queue_2_bp_misses: 0 rx_queue_2_bp_cleaned: 0 rx_queue_3_packets: 0 rx_queue_3_bytes: 0 rx_queue_3_bp_poll_yield: 0 rx_queue_3_bp_misses: 0 rx_queue_3_bp_cleaned: 0 rx_queue_4_packets: 1656985 rx_queue_4_bytes: 2250185630 rx_queue_4_bp_poll_yield: 0 rx_queue_4_bp_misses: 0 rx_queue_4_bp_cleaned: 0 rx_queue_5_packets: 1107023 rx_queue_5_bytes: 1503337234 rx_queue_5_bp_poll_yield: 0 rx_queue_5_bp_misses: 0 rx_queue_5_bp_cleaned: 0 rx_queue_6_packets: 0 rx_queue_6_bytes: 0 rx_queue_6_bp_poll_yield: 0 rx_queue_6_bp_misses: 0 rx_queue_6_bp_cleaned: 0 rx_queue_7_packets: 0 rx_queue_7_bytes: 0 rx_queue_7_bp_poll_yield: 0 rx_queue_7_bp_misses: 0 rx_queue_7_bp_cleaned: 0 rx_queue_8_packets: 0 rx_queue_8_bytes: 0 rx_queue_8_bp_poll_yield: 0 rx_queue_8_bp_misses: 0 rx_queue_8_bp_cleaned: 0 rx_queue_9_packets: 0 rx_queue_9_bytes: 0 rx_queue_9_bp_poll_yield: 0 rx_queue_9_bp_misses: 0 rx_queue_9_bp_cleaned: 0 rx_queue_10_packets: 1668431 rx_queue_10_bytes: 2265729298 rx_queue_10_bp_poll_yield: 0 rx_queue_10_bp_misses: 0 rx_queue_10_bp_cleaned: 0 rx_queue_11_packets: 1106051 rx_queue_11_bytes: 1502017258 rx_queue_11_bp_poll_yield: 0 rx_queue_11_bp_misses: 0 rx_queue_11_bp_cleaned: 0 rx_queue_12_packets: 0 rx_queue_12_bytes: 0 rx_queue_12_bp_poll_yield: 0 rx_queue_12_bp_misses: 0 rx_queue_12_bp_cleaned: 0 rx_queue_13_packets: 0 rx_queue_13_bytes: 0 rx_queue_13_bp_poll_yield: 0 rx_queue_13_bp_misses: 0 rx_queue_13_bp_cleaned: 0 rx_queue_14_packets: 1107157 rx_queue_14_bytes: 1503519206 rx_queue_14_bp_poll_yield: 0 rx_queue_14_bp_misses: 0 rx_queue_14_bp_cleaned: 0 rx_queue_15_packets: 1107574 rx_queue_15_bytes: 1504085492 rx_queue_15_bp_poll_yield: 0 rx_queue_15_bp_misses: 0 rx_queue_15_bp_cleaned: 0 rx_queue_16_packets: 0 rx_queue_16_bytes: 0 rx_queue_16_bp_poll_yield: 0 rx_queue_16_bp_misses: 0 rx_queue_1 On Wed, Sep 7, 2016 at 5:04 PM, Skidmore, Donald C < donald.c.skidm...@intel.com> wrote: > Hey Hank, > > > > Well looks like all your traffic is just hashing to 2 queues. You have > ATR enabled but it isn’t being used, do to this being UDP traffic. That > isn’t a problem since RSS hash will occur on anything that doesn’t match > ATR (in your case everything). All this means that you only have 2 flows > and thus all the work is being done with only two queues. To get a better > hash spread you could modify the RSS hash key, but I would first look at > your traffic to see if you even have more than 2 flows operating. Maybe > something can be done in the application to allow for more parallelism, run > for threads for instances (assuming each thread opens its own socket)? > > > > As for the rx_no_dma_resource counter it is tried in directly to one of > our HW counters. It gets bumped if the target queue is disabled (unlikely > in your case) or there are no free descriptors in the target queue. The > later makes sense since all of your traffic is going to just two queue that > appear to not be getting drained fast enough. > > > > Thanks, > > -Don <donald.c.skidm...@intel.com> > > > > > > > > *From:* Hank Liu [mailto:hank.tz...@gmail.com] > *Sent:* Wednesday, September 07, 2016 4:51 PM > *To:* Skidmore, Donald C <donald.c.skidm...@intel.com> > *Cc:* Rustad, Mark D <mark.d.rus...@intel.com>; > e1000-devel@lists.sourceforge.net > > *Subject:* Re: [E1000-devel] Intel 82599 AXX10GBNIAIOM cards for 10G SFPs > UDP performance issue > > > > Hi Don, > > > > I got log for you to look at. See attached... > > > > Thanks and let me know. BTW, can anyone tell me what could cause > rx_no_dma_resource? > > > > Hank > > > > On Wed, Sep 7, 2016 at 4:04 PM, Skidmore, Donald C < > donald.c.skidm...@intel.com> wrote: > > ATR is application targeted receive. It may be useful for you but the > flow isn’t directed to a CPU until you transmit and since you mentioned you > don’t do much transmission it would have to be via the ACK’s. Likewise the > flows will need to stick around for a while to gain any advantage from it. > Still it wouldn’t hurt to test using the ethtool command Alex mentioned in > another email. > > > > In general I would like to see you just go with the default of 16 RSS > queues and not attempt to mess with the affinization of the interrupt > vectors. If the performance is still bad I would be interested how the > flows were being distributed between the queues. You can see this via > packet counts per queue you get out of ethtool stats. What I want to > eliminate is the possibility that RSS is seeing all your traffic as one > flow. > > > > Thanks, > > -Don <donald.c.skidm...@intel.com> > > > > > > *From:* Hank Liu [mailto:hank.tz...@gmail.com] > *Sent:* Wednesday, September 07, 2016 3:40 PM > *To:* Rustad, Mark D <mark.d.rus...@intel.com> > *Cc:* Skidmore, Donald C <donald.c.skidm...@intel.com>; > e1000-devel@lists.sourceforge.net > *Subject:* Re: [E1000-devel] Intel 82599 AXX10GBNIAIOM cards for 10G SFPs > UDP performance issue > > > > Mark, > > > > Thanks! > > > > Test app can specify how many pthread to handle connections. I have tried > 4, 8, 16, etc, but none of them make significant difference. CPU usage on > receive end is moderate (50-60%). If I want to poll aggressively to prevent > any drop in UDP layer, then it might go up a bit. On the CPU set that > handle network interrupts, I did pin those CPUs, I can see interrupt rate > is pretty even on all CPUs involved. > > > > Since seeing a lot of rx_no_dma_resource and this counter is read out > through 82599 controller, I like to know why it happened. Note: I already > bumped rx ring size to maximum (4096) I can set in ethtool. > > > > BTW, what is ATR? I didn't set up any filter... > > > > > > Hank > > > > On Wed, Sep 7, 2016 at 2:19 PM, Rustad, Mark D <mark.d.rus...@intel.com> > wrote: > > Hank Liu <hank.tz...@gmail.com> wrote: > > *From:* Hank Liu [mailto:hank.tz...@gmail.com] > *Sent:* Wednesday, September 07, 2016 10:20 AM > *To:* Skidmore, Donald C <donald.c.skidm...@intel.com> > *Cc:* e1000-devel@lists.sourceforge.net > *Subject:* Re: [E1000-devel] Intel 82599 AXX10GBNIAIOM cards for 10G SFPs > UDP performance issue > > > > Thanks for quick response and helping. I guess I didn't make it clear is > that the application (receiver, sender) open 240 connections each > connection has 34 Mbps traffic. > > > You say that there are 240 connections, but how many threads is your app > using? One per connection? What does the cpu utilization look like on the > receiving end? > > Also, the current ATR implementation does not support UDP, so you are > probably better off not pinning the app threads at all and trusting that > the scheduler will migrate them to the cpu that is getting their packets > via RSS. You should still set the affinity of the interrupts in that case. > The default number of queues should be fine. > > -- > Mark Rustad, Networking Division, Intel Corporation > > > > >
------------------------------------------------------------------------------
_______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired