From: Koehrer Mathias
> Sent: 13 October 2016 11:57
> > The time between my trace points 700 and 701 is about 30us, the time
> > between my
> > trace points 600 and 601 is even 37us!!
> > The code in between is
> > tsyncrxctl = rd32(E1000_TSYNCRXCTL); resp.
> > lvmmc = rd32(E1000_LVMMC);
> > In both cases this is a single read from a register.
> > I have no idea why this single read could take that much time!
> > Is it possible that the igb hardware is in a state that delays the read
> > access and this is
> > why the whole I/O system might be delayed?
> To have a proper comparison, I did the same with kernel 3.18.27-rt27.
> Also here, I instrumented the igb driver to get traces for the rd32 calls.
> However, here everything is generally much faster!
> In the idle system the maximum I got for a read was about 6us, most times it
> was 1-2us.
1-2us is probably about right, PCIe is high throughput high latency.
You should see the latencies we get talking to fpga!
> On the 4.8 kernel this is always much slower (see above).
> My question is now: Is there any kernel config option that has been
> introduced in the meantime
> that may lead to this effect and which is not set in my 4.8 config?
Have a look at the generated code for rd32().
Someone might have added a load of synchronisation instructions to it.
On x86 I don't think it needs any.
It is also possible for other PCIe accesses to slow things down
(which might be why you see 6us).
I presume you are doing these comparisons on the same hardware?
Obscure bus topologies could slow things down.