Re: [casper] NIC tuning and IRQ binding : Regarding

David MacMahon Thu, 10 Sep 2020 01:04:09 -0700

Hi, Hari,

I think modern Linux network drivers use a "polling" approach rather than an 
interrupt driven approach, so I've found IRQ affinity to be less important than 
it used to be.  This can be observed as relatively low interrupt counts in 
/proc/interrupts.  The main things that I've found beneficial are:

1. Ensuring that the processing code runs on CPU cores in the same socket that 
the NIC's PCIe slot is connected to.  If you have a multi-socket NUMA system 
you will want to become familiar with its NUMA topology.  The "hwloc" package 
includes the cool "lstopo" utility that will show you a lot about your system's 
topology.  Even on a single socket system it can help to stay away from core 0 
where many OS things tend to run.

2. Ensuring that memory allocations happen after your processes/threads have 
had their CPU affinity set, either by "taskset" or "numactl" or its own 
built-in CPU affinity setting code.  This is mostly for NUMA systems.

3. Ensuring that various buffers are sized appropriately.  There are a number 
of settings that can be tweaked in this category, most via "sysctl".  I won't 
dare to make any specific recommendations here.  Everybody seems to have their 
own set of "these are the settings I used last time".  One of the most 
important things you can do in your packet receiving code is to keep track of 
how many packets you receive over a certain time interval.  If this value does 
not match the expected number of packets then you have a problem.  Any 
difference usually will be that the received packet count is lower than the 
expected packet count.  Some people call these dropped packets, but I prefer to 
call them "missed packets" at this point because all we can say is that we 
didn't get them.  We don't yet know what happened to them (maybe they were 
dropped, maybe they were misdirected, maybe they were never sent), but it helps 
to know where to look to find out.

4. Places to check for missing packets getting "dropped":

4.1 If you are using "normal" (aka SOCK_DGRAM) sockets to receive UDP packets, 
you will see a line in /proc/net/udp for your socket.  The last number on that 
line will be the number of packets that the kernel wanted to give to your 
socket but couldn't because the socket's receive buffer was full so the kernel 
had to drop the packet.

4.2 If you are using "packet" (aka SOCK_RAW) sockets to receive UDP packets, 
there are ways to get the total number of packets the kernel has handled for 
that socket and the number it had to drop because of lack of kernel/application 
buffer space.  I forget the details, but I'm sure you can google for it.  If 
you're using Hashpipe's packet socket support it has a function that will fetch 
these values for you.

4.3 The ifconfig utility will give you a count of "RX errors".  This is a 
generic category and I don't know all possible contributions to it, but one is 
that the NIC couldn't pass packets to the kernel.

4.4 Using "ethtool -S IFACE" (eg "ethtool -S eth4") will show you loads of 
stats.  These values all come from counters on the NIC.  Two interesting ones 
are called something like "rx_dropped" and "rx_fifo_errors".  A non-zero 
rx_fifo_errors value means that the kernel was not keeping up with the packet 
rate for long enough that the NIC/kernel buffers filled up and packets had to 
be dropped.

4.5 If you're using a lower-level kernel bypass approach (e.g. IBVerbs or 
DPDK), then you may have to dig a little harder to find the packet drop 
counters as th kernel is no longer involved and all the previously mentioned 
counters will be useless (with the possible exception of the NIC counters).

4.6 You may be able to login to and query your switch for interface statistics. 
 That can show various data and packet rates as well as bytes sent, packets 
sent, and some various error counters.

One thing to remember about buffer sizes is that if your average processing 
rate isn't keeping up with the data rate, larger buffers won't solve your 
problem.  Larger buffers will only allow the system to withstand slightly 
longer temporary lulls in throughput ("hiccups") if the overall throughput of 
the system (including the lulls/hiccups) is as fast or (ideally) faster than 
the incoming data rate.

Hope this helps,
Dave

> On Sep 9, 2020, at 22:15, Hariharan Krishnan <vasanthikrishh...@gmail.com> 
> wrote:
> 
> Hello Everyone,
> 
>                   I'm trying to tune the NIC on a server with Ubuntu 18.04 OS 
> to listen to a multicast network and optimize it for throughput through IRQ 
> affinity binding. It is a Mellanox card and I tried using the "mlnx_tune" for 
> doing this, but haven't been successful. 
> I would really appreciate any help in this regard.
> 
> Looking forward to responses from the group.
> 
> Thank you.
> 
> Regards,
> 
> Hari
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu 
> <mailto:casper+unsubscr...@lists.berkeley.edu>.
> To view this discussion on the web visit 
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAHNYk1yn5xkdjfDVMm0UMO%3DQ-vjfm4nmVQbf-Jt1b4kGjB9VUQ%40mail.gmail.com
>  
> <https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAHNYk1yn5xkdjfDVMm0UMO%3DQ-vjfm4nmVQbf-Jt1b4kGjB9VUQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/3E685598-8E83-429C-AD7F-3B44D3C90F05%40berkeley.edu.

Re: [casper] NIC tuning and IRQ binding : Regarding

Reply via email to