bertram.felgenhauer: > This is odd, but it doesn't hurt the inner loop, which only involves > $wsum01_XPd, and is identical to $wfold_s15t above. > > > Checking the asm: > > $ ghc -O2 -fasm > > > > sQ3_info: > > .LcRt: > > cmpq 8(%rbp),%rsi > > jg .LcRw > > leaq 1(%rsi),%rax > > addq %rsi,%rbx > > movq %rax,%rsi > > jmp sQ3_info > > So for some reason ghc ends up doing the (n + 1) addition before the > (acc + n) addition in this case - this accounts for the extra > instruction, because both n+1 and n need to be kept around for the > duration of the addq (which does the acc + n addition).
Yep, well spotted. > > Checking via C: > > > > $ ghc -O2 -optc-O3 -fvia-C > > > > Better code, but still a bit slower: > > > > sQ3_info: > > cmpq 8(%rbp), %rsi > > jg .L8 > > addq %rsi, %rbx > > leaq 1(%rsi), %rsi > > jmp sQ3_info > > This code is identical (up to renaming registers and one offset that > I can't fully explain, but is probably related to a slight difference > in handling pointer tags between the two versions of the code) to the > "nice assembly" above. Indeed, which is gratifying. > > Running: > > > > $ time ./B > > 500000000500000000 > > ./B 1.01s user 0.01s system 97% cpu 1.035 total > > Hmm, about 5% slower, are you sure this isn't just noise? > > If not noise, it may be some alignment effect. Hard to say. I couldn't get it under 1s from a dozen runs, so assuming some small effect with alignment. Why we get the extra test in the outer loop though, not sure. That's new too I think -- at least I've not seen that pattern before. -- Don _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe