Re: ix(intel) vs mlxen(mellanox) 10Gb performance

Rick Macklem Mon, 24 Aug 2015 05:26:36 -0700

Daniel Braniss wrote:
> 
> > On 24 Aug 2015, at 10:22, Hans Petter Selasky <[email protected]> wrote:
> > 
> > On 08/24/15 01:02, Rick Macklem wrote:
> >> The other thing is the degradation seems to cut the rate by about half
> >> each time.
> >> 300-->150-->70 I have no idea if this helps to explain it.
> > 
> > Might be a NUMA binding issue for the processes involved.
> > 
> > man cpuset
> > 
> > --HPS
> 
> I can’t see how this is relevant, given that the same host, using the
> mellanox/mlxen
> behave much better.
Well, the "ix" driver has a bunch of tunables for things like "number of queues"
and although I'll admit I don't understand how these queues are used, I think
they are related to CPUs and their caches. There is also something called 
IXGBE_FDIR,
which others have recommended be disabled. (The code is #ifdef IXGBE_FDIR, but 
I don't
know if it defined for your kernel?) There are also tunables for interrupt rate 
and
something called hw.ixgbe_tx_process_limit, which appears to limit the number 
of packets
to send or something like that?
(I suspect Hans would understand this stuff much better than I do, since I 
don't understand
 it at all.;-)


At a glance, the mellanox  driver looks very different.

> I’m getting different results with the intel/ix depending who is the nfs
> server
> 
Who knows until you figure out what is actually going on. It could just be the 
timing of
handling the write RPCs or when the different servers send acks for the TCP 
segments or ...
that causes this for one server and not another.

One of the principals used when investigating airplane accidents is to "never 
assume anything"
and just try to collect the facts until the pieces of the puzzle fall in place. 
I think the
same principal works for this kind of stuff.
I once had a case where a specific read of one NFS file would fail on certain 
machines.
I won't bore you with the details, but after weeks we got to the point where we 
had a lab
of identical machines (exactly the same hardware and exactly the same software 
loaded on them)
and we could reproduce this problem on about half the machines and not the 
other half. We
(myself and the guy I worked with) finally noticed the failing machines were on 
network ports
for a given switch. We moved the net cables to another switch and the problem 
went away.
--> This particular network switch was broken in such a way that it would 
garble one specific
    packet consistently, but worked fine for everything else.
My point here is that, if someone had suggested the "network switch might be 
broken" at the
beginning of investigating this, I would have probably dismissed it, based on 
"the network is
working just fine", but in the end, that was the problem.
--> I am not suggesting you have a broken network switch, just "don't take 
anything off the
    table until you know what is actually going on".

And to be honest, you may never know, but it is fun to try and solve these 
puzzles.
Beyond what I already suggested, I'd look at the "ix" driver's stats and 
tunables and
see if any of the tunables has an effect. (And, yes, it will take time to work 
through these.)

Good luck with it, rick

> 
> danny
> 
> _______________________________________________
> [email protected] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "[email protected]"
_______________________________________________
[email protected] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[email protected]"

Re: ix(intel) vs mlxen(mellanox) 10Gb performance

Reply via email to