On 10-08-17 04:51 PM, Mathieu Desnoyers wrote:
* David Goulet ([email protected]) wrote:


On 10-08-17 04:24 PM, Mathieu Desnoyers wrote:
* David Goulet ([email protected]) wrote:
On 10-08-17 03:45 PM, Mathieu Desnoyers wrote:
[...]
Yes. The performance degradation caused by cache-line bouncing is _way_
worse than extra cache pressure.


There is something I don't understand here. Correct me if (most likely)
I am wrong.

How cache line bouncing is affected by the cache line size? If I
understand correctly, cache line bounce is the problem where CPUs shares
data and have to fetch it from CPU0 to CPU7 (between caches). And, I
surely agree, this is costly!

That's ok up to here.


However, if the size of the cache is bigger then the normal cache, you
just loose space... For arch with 64 cache line size, you loose two line
per structure aligned... How lowering down to 64 bytes will cause cache
line bouncing?

Let's take the following example:

A multiprocessor machine with 256 bytes cache line size.
The program is built thinking the cache line size is only 128 bytes.

So we allocate an array of what we hope are per-cpu variables:

   malloc(nr_cpus * sizeof(struct type));

Where struct type is __attribute__((aligned(128))

So we end up having two structures sharing a cache-line, and these will
bounce between CPUs, even though the structures are not shared: only the
cache-lines are shared, because the structures happen to be on the same
cache line.

So for allocation of individual objects which are meant to be per-cpu,
e.g. a structure controlling the per-cpu buffer, the allocator can put
one structure next to another (belonging to another cpu), thus causing
cache line bouncing.

This phenomenon is called "false sharing".


Very nice. That clarify yes!

However, please refer to Intel® 64 and IA-32 Architectures Software
Developer's Manual Volume 3A: System Programming Guide.

http://www.intel.com/Assets/PDF/manual/253668.pdf

P. 527, Table 11-1

• Pentium 4 and Intel Xeon processors (Based on Intel NetBurst
  microarchitecture): 8-KByte, 4-way set associative, 64-byte cache line
size.
• Pentium 4 and Intel Xeon processors (Based on Intel NetBurst
  microarchitecture): 16-KByte, 8-way set associative, 64-byte cache line
size.

Dunno why the Linux kernel choses that for P4. But we definitely have to
handle NUMA systems.


arch_numa.h ... possible?

Mathieu


David

Mathieu


Thanks for your help on that!
David



--
David Goulet
LTTng project, DORSAL Lab.

PGP/GPG : 1024D/16BD8563
BE3C 672B 9331 9796 291A  14C6 4AF7 C14B 16BD 8563



--
David Goulet
LTTng project, DORSAL Lab.

PGP/GPG : 1024D/16BD8563
BE3C 672B 9331 9796 291A  14C6 4AF7 C14B 16BD 8563

_______________________________________________
ltt-dev mailing list
[email protected]
http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev

Reply via email to