On Friday, 23 December 2016 at 22:11:31 UTC, Walter Bright wrote:
On 12/23/2016 10:03 AM, hardreset wrote:For this D code: enum SIZE = 100000000; void foo(int* a, int* b) { int* atop = a + 1000; ptrdiff_t offset = b - a; for (; a < atop; ++a) *a &= *(a + offset); } The following asm is generated by DMD: push EBX mov EBX,8[ESP] sub EAX,EBX push ESI cdq and EDX,3 add EAX,EDX sar EAX,2 lea ECX,0FA0h[EBX] mov ESI,EAX cmp EBX,ECX jae L2C L20: mov EDX,[ESI*4][EBX] and [EBX],EDX add EBX,4 cmp EBX,ECX jb L20 L2C: pop ESI pop EBX ret 4The inner loop is 5 instructions, whereas the one you wrote is 7 instructions (I didn't benchmark it). With some more source code manipulation the divide can be eliminated, but that is irrelevant to the inner loop.
I patched up the prolog code and timed it and it came out identical to my asm. I tried the ptrdiff C-like code and that still comes out 20% slower here. I'm compiling with...
rdmd test.d -O -release -inline Am I missing something? How do I get the asm output?
