On Monday, 24 April 2017 at 17:48:50 UTC, Stefan Koch wrote:
On Monday, 24 April 2017 at 11:29:01 UTC, Ola Fosheim Grøstad wrote:

What are scaled loads?

x86 has addressing modes which allow you to multiply an index by a certain set of scalars and add it as on offset to the pointer you want to load. Thereby making memory access patterns more transparent to the caching and prefetch systems.
As well as reducing the overall code-size.

Oh, ok. AFAIK The decoding of indexing modes into micro-ops (the real instructions used inside the CPU, not the actual op-codes) has no effect on the caching system. It may however compress the generated code so you don't flush the instruction cache and speed up the decoding of op-codes into micro-ops.

If you want to improve cache loads you have to consider when to use the "prefetch" instructions, but the effect (positive or negative) varies greatly between CPU generations so you will basically need to target each CPU-generation individually.

Probably too much work to be worthwhile as it usually doesn't pay off until you work on large datasets and then you usually have to be careful with partitioning the data into cache-friendly working-sets. Probably not so easy to do for a JIT.

You'll probably get a decent performance boost without worrying about caching too much in the first implementation anyway. Any gains in that area could be obliterated in the next CPU generation... :-/





Reply via email to