Re: stride in slices

Ethan via Digitalmars-d Tue, 05 Jun 2018 13:11:14 -0700

On Tuesday, 5 June 2018 at 19:05:27 UTC, DigitalDesigns wrote:

For loops HAVE a direct cpu semantic! Do you doubt this?

...

Right. If you're gonna keep running your mouth off. How aboutlooking at some disassembly then.


for(auto i=0; i<a.length; i+=strideAmount)

Using ldc -O4 -release for x86_64 processors, the initialisertranslates to:


mov byte ptr [rbp + rcx], 0

The comparison translates to:

cmp r13, rcx
ja .LBB0_2

And the increment and store translates to:

mov byte ptr [rbp + rcx], 0
movsxd rcx, eax
add eax, 3

So. It uses three of the most basic instructions you can thinkof: mov, cmp, j<cond>, add.

Now, what might you ask are the instructions that a rangecompiles down to when everything is properly inlined?


The initialisation, since it's a function, pulls from the stack.

mov rax, qword ptr [rsp + 16]
movsxd rcx, dword ptr [rsp + 32]

But the comparison looks virtually identical.

cmp rax, rcx
jb .LBB2_4

But how does it do the add? With some register magic.

movsxd rcx, edx
lea edx, [rcx + r9]

Now, what that looks like it's doing to me is combing the pointerload and index increment in to two those two instructions. Oneinstruction less than the flat for loop.

In conclusion. The semantics you talk about are literally some ofthe most basic instructions in computing; and that escaping theconfines of a for loop for a foreach loop can let the compilergenerate more efficient code than 50-year-old compsci conceptscan.

Re: stride in slices

Reply via email to