On Tuesday, 5 June 2018 at 19:05:27 UTC, DigitalDesigns wrote:
For loops HAVE a direct cpu semantic! Do you doubt this?

...

Right. If you're gonna keep running your mouth off. How about looking at some disassembly then.

for(auto i=0; i<a.length; i+=strideAmount)

Using ldc -O4 -release for x86_64 processors, the initialiser translates to:

mov byte ptr [rbp + rcx], 0

The comparison translates to:

cmp r13, rcx
ja .LBB0_2

And the increment and store translates to:

mov byte ptr [rbp + rcx], 0
movsxd rcx, eax
add eax, 3

So. It uses three of the most basic instructions you can think of: mov, cmp, j<cond>, add.

Now, what might you ask are the instructions that a range compiles down to when everything is properly inlined?

The initialisation, since it's a function, pulls from the stack.

mov rax, qword ptr [rsp + 16]
movsxd rcx, dword ptr [rsp + 32]

But the comparison looks virtually identical.

cmp rax, rcx
jb .LBB2_4

But how does it do the add? With some register magic.

movsxd rcx, edx
lea edx, [rcx + r9]

Now, what that looks like it's doing to me is combing the pointer load and index increment in to two those two instructions. One instruction less than the flat for loop.

In conclusion. The semantics you talk about are literally some of the most basic instructions in computing; and that escaping the confines of a for loop for a foreach loop can let the compiler generate more efficient code than 50-year-old compsci concepts can.

Reply via email to