Hi, Hari,
I think modern Linux network drivers use a "polling" approach rather than an
interrupt driven approach, so I've found IRQ affinity to be less important than
it used to be. This can be observed as relatively low interrupt counts in
/proc/interrupts. The main things that I've found beneficial are:
1. Ensuring that the processing code runs on CPU cores in the same socket that
the NIC's PCIe slot is connected to. If you have a multi-socket NUMA system
you will want to become familiar with its NUMA topology. The "hwloc" package
includes the cool "lstopo" utility that will show you a lot about your system's
topology. Even on a single socket system it can help to stay away from core 0
where many OS things tend to run.
2. Ensuring that memory allocations happen after your processes/threads have
had their CPU affinity set, either by "taskset" or "numactl" or its own
built-in CPU affinity setting code. This is mostly for NUMA systems.
3. Ensuring that various buffers are sized appropriately. There are a number
of settings that can be tweaked in this category, most via "sysctl". I won't
dare to make any specific recommendations here. Everybody seems to have their
own set of "these are the settings I used last time". One of the most
important things you can do in your packet receiving code is to keep track of
how many packets you receive over a certain time interval. If this value does
not match the expected number of packets then you have a problem. Any
difference usually will be that the received packet count is lower than the
expected packet count. Some people call these dropped packets, but I prefer to
call them "missed packets" at this point because all we can say is that we
didn't get them. We don't yet know what happened to them (maybe they were
dropped, maybe they were misdirected, maybe they were never sent), but it helps
to know where to look to find out.
4. Places to check for missing packets getting "dropped":
4.1 If you are using "normal" (aka SOCK_DGRAM) sockets to receive UDP packets,
you will see a line in /proc/net/udp for your socket. The last number on that
line will be the number of packets that the kernel wanted to give to your
socket but couldn't because the socket's receive buffer was full so the kernel
had to drop the packet.
4.2 If you are using "packet" (aka SOCK_RAW) sockets to receive UDP packets,
there are ways to get the total number of packets the kernel has handled for
that socket and the number it had to drop because of lack of kernel/application
buffer space. I forget the details, but I'm sure you can google for it. If
you're using Hashpipe's packet socket support it has a function that will fetch
these values for you.
4.3 The ifconfig utility will give you a count of "RX errors". This is a
generic category and I don't know all possible contributions to it, but one is
that the NIC couldn't pass packets to the kernel.
4.4 Using "ethtool -S IFACE" (eg "ethtool -S eth4") will show you loads of
stats. These values all come from counters on the NIC. Two interesting ones
are called something like "rx_dropped" and "rx_fifo_errors". A non-zero
rx_fifo_errors value means that the kernel was not keeping up with the packet
rate for long enough that the NIC/kernel buffers filled up and packets had to
be dropped.
4.5 If you're using a lower-level kernel bypass approach (e.g. IBVerbs or
DPDK), then you may have to dig a little harder to find the packet drop
counters as th kernel is no longer involved and all the previously mentioned
counters will be useless (with the possible exception of the NIC counters).
4.6 You may be able to login to and query your switch for interface statistics.
That can show various data and packet rates as well as bytes sent, packets
sent, and some various error counters.
One thing to remember about buffer sizes is that if your average processing
rate isn't keeping up with the data rate, larger buffers won't solve your
problem. Larger buffers will only allow the system to withstand slightly
longer temporary lulls in throughput ("hiccups") if the overall throughput of
the system (including the lulls/hiccups) is as fast or (ideally) faster than
the incoming data rate.
Hope this helps,
Dave
> On Sep 9, 2020, at 22:15, Hariharan Krishnan <[email protected]>
> wrote:
>
> Hello Everyone,
>
> I'm trying to tune the NIC on a server with Ubuntu 18.04 OS
> to listen to a multicast network and optimize it for throughput through IRQ
> affinity binding. It is a Mellanox card and I tried using the "mlnx_tune" for
> doing this, but haven't been successful.
> I would really appreciate any help in this regard.
>
> Looking forward to responses from the group.
>
> Thank you.
>
> Regards,
>
> Hari
>
> --
> You received this message because you are subscribed to the Google Groups
> "[email protected]" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected]
> <mailto:[email protected]>.
> To view this discussion on the web visit
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAHNYk1yn5xkdjfDVMm0UMO%3DQ-vjfm4nmVQbf-Jt1b4kGjB9VUQ%40mail.gmail.com
>
> <https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAHNYk1yn5xkdjfDVMm0UMO%3DQ-vjfm4nmVQbf-Jt1b4kGjB9VUQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to the Google Groups
"[email protected]" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/3E685598-8E83-429C-AD7F-3B44D3C90F05%40berkeley.edu.