On Tuesday, 15 August 2017 17:41:54 CEST, Matthew Dillon wrote:
The overhead is around 60KB/process.  The pipe adds another 16KB/process or
so.   Multply 76KB x 900000 (76e3 * 900000.0) and the total system overhead
is around 68GB.

aaah, math :). I probably had terabytes in mind ;-)

Yah, the pipe test is very cool.  What's really interesting about it is
testing various numbers of processes and looking at the Intel pcx.x
output.  As the number of processes increase first the IPC goes to hell,
instruction efficiency drops to 0.2, then the L3 cache blows up and the
hardware starts to have to access dynamic ram for everything.  it is an
intentionally cpu-inefficient test.

I'm amazed that the cpu's can even do 0.2 IPC under these conditions.
That's actually quite impressive.

How is this IPC counted? Is this the average IPC collected over all hyperthreads?

Once the L3 cache blows up, does the IPC still stay at 0.2?
IIRC there is a 10x difference in performance between accesses to the L3
vs dynamic ram (600 cycles?), which would suggest that the IPC will suffer
from that. But on the other hand there seem to be enough instructions
available that do not directly access any of the memory that is not in the
cache and that will probably hide the long stalls due to L3 or dram accesses.

I wonder whether for this kind of benchmark, pinning all 900k processes to one CPU would make any big difference (excluding time to fork and teardown the processes).

Pretty cool stuff!

Regards,

 Michael

Reply via email to