First sorry for the delay, I was off for a time.
We do not use UBUNTU, but DEBIAN, but the two distribs are in fact two
flavours ofthe same thing.
We manage on each machine an UDP download link from a ROACH2. ROACH2
does nothing but adding an 8byte counter to each 8K data block. That way
we can precisely mesure the packet loss rate.
First notice that driver writers for 10GBe NIC found wise to split IRQ
on wether six or seven IRQ numbers, why this? I do not have the
slightest idea. Then come the worse. We run at 1.1 Gsamples/sec, we are
very close to the link capacity. With the standard setup the loss may
rise to 5%, with a current value of 1%. I suspect a cache problem. On
our 8 (real) core system, IRQ can be splitted on each one, but each has
to be aware of what is currently done by the others. This coherence
issue may take some cycles. Using that guess I assigned all IRQ of a
given I/F to a single core (/proc/irq/xx/smp_affinity). Concurently I
removed all other things from this core (smp affinity and taskset). It
worked: the loss is now arround 10^^-6, which we find acceptable.
The new pledge is named irqbalance, which takes over you on IRQ
aptitude remove irqbalance.
That's harmless.
You may wish also to get rid of systemd, which takes cycles for a
questionable purpose, but the issue is hazardous. Anyhow we took this
option.
systemd gets worse at each OS release.
Jean Borsenberger
On 22/09/18 01:56 PM, Gary, Dale E. wrote:
Hi All,
We are running a multi-core (32-core) system at Owens Valley that has
a dual-port Myricom 10GBe NIC. We ran the system very successfully
under Ubuntu 12.04 for more than 1 year, but after upgrading to Ubuntu
18.04 (generic) we are now experiencing reliability problems, despite
the tuning parameters and smp_affinity adjustments being (as far as we
can tell) the same. The problem seems to be somehow associated with
system load and packet handling rather than receipt of the packets by
the interface, since things run fine for up to 10 minutes, then start
to deteriorate. In researching this, I see various other flavors of
Ubuntu (low-latency, realtime, rt, preempt) that make kernel
adjustments that might help, but I am not able to tell from the
descriptions which if any of these might address the problem. Has
anyone had a similar experience, and/or have advice about what options
we might have? I am using the myri10ge driver that came with Ubuntu
18.04.
One thing I might mention is that I ran this script:
https://github.com/majek/dump/blob/master/how-to-receive-a-packet/softnet.sh,
and find a certain number of "squeezed" packets, which are "# of times
ksoftirq ran out of netdev_budget or time slice with work remaining."
I don't know if this is something to worry about? The output of
softnet.sh is like this. Note we had the NIC assigned to cpus 1 and
2, but changed to 30 and 31.
user@dpp:~$ ./softnet.sh
cpu total dropped squeezed collision rps flow_limit
0 1328082 0 3729 0 0 0
1 1716559544 0 7208929 0 0 0
2 1793125842 0 8158475 0 0 0
3 1069150 0 3714 0 0 0
4 1400569 0 5443 0 0 0
5 6988379 0 5985 0 0 0
6 6466640 0 5950 0 0 0
7 1070366 0 4097 0 0 0
8 878808 0 3906 0 0 0
9 933541 0 4207 0 0 0
10 1229 0 4 0 0 0
11 848 0 0 0 0 0
12 1310 0 5 0 0 0
13 662 0 0 0 0 0
14 1304 0 2 0 0 0
15 680 0 3 0 0 0
16 1817 0 2 0 0 0
17 648 0 3 0 0 0
18 742 0 2 0 0 0
19 605 0 2 0 0 0
20 690 0 2 0 0 0
21 536 0 3 0 0 0
22 860 0 0 0 0 0
23 493 0 3 0 0 0
24 1657 0 4 0 0 0
25 9244642 0 1487 0 0 0
26 912 0 2 0 0 0
27 287 0 0 0 0 0
28 5252171 0 877 0 0 0
29 339 0 3 0 0 0
30 3378532079 0 17299324 0 0 0
31 3390959304 0 16129528 0 0 0
Thanks,
Dale
--
You received this message because you are subscribed to the Google
Groups "[email protected]" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected]
<mailto:[email protected]>.
To post to this group, send email to [email protected]
<mailto:[email protected]>.
--
You received this message because you are subscribed to the Google Groups
"[email protected]" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].