On Tuesday, 25 April 2017 at 09:09:00 UTC, Ola Fosheim Grøstad
wrote:
On Monday, 24 April 2017 at 17:48:50 UTC, Stefan Koch wrote:
[...]
Oh, ok. AFAIK The decoding of indexing modes into micro-ops
(the real instructions used inside the CPU, not the actual
op-codes) has no effect on the caching system. It may however
compress the generated code so you don't flush the instruction
cache and speed up the decoding of op-codes into micro-ops.
If you want to improve cache loads you have to consider when to
use the "prefetch" instructions, but the effect (positive or
negative) varies greatly between CPU generations so you will
basically need to target each CPU-generation individually.
Probably too much work to be worthwhile as it usually doesn't
pay off until you work on large datasets and then you usually
have to be careful with partitioning the data into
cache-friendly working-sets. Probably not so easy to do for a
JIT.
You'll probably get a decent performance boost without worrying
about caching too much in the first implementation anyway. Any
gains in that area could be obliterated in the next CPU
generation... :-/
It's already the case. Intel and AMD (especially in Ryzen)
strongly discourage the use of prefetch instructions since at
least Core2 and Athlon64. The icache cost rarely pays off and
very often breaks the auto-prefetcher algorithms by spoiling
memory bandwidth.