marlowsd: > > I manged to improve this: > > Main_mainzuzdszdwfold_info: > .Lc1lP: > addq $32,%r12 > cmpq 144(%r13),%r12 > ja .Lc1lS > movq %r14,%rax > cmpq $1000000000,%rax > jne .Lc1lV > movq $ghczmprim_GHCziTypes_Dzh_con_info,-24(%r12) > movsd %xmm6,-16(%r12) > movq $ghczmprim_GHCziTypes_Dzh_con_info,-8(%r12) > movsd %xmm5,(%r12) > leaq -7(%r12),%rbx > leaq -23(%r12),%r14 > jmp *(%rbp) > .Lc1lS: > movq $32,184(%r13) > movl $Main_mainzuzdszdwfold_closure,%ebx > addq $-24,%rbp > movsd %xmm5,(%rbp) > movsd %xmm6,8(%rbp) > movq %r14,16(%rbp) > jmp *-8(%r13) > .Lc1lV: > addsd .Ln1m2(%rip),%xmm5 > addsd .Ln1m3(%rip),%xmm6 > leaq 1(%rax),%r14 > addq $-32,%r12 > jmp Main_mainzuzdszdwfold_info > > > from 9 instructions in the last block down to 5 (one instruction fewer > than gcc). I haven't commoned up the two constant 1's though, that'd > mean doing some CSE. > > On my machine with GHC HEAD and gcc 4.3.0, the gcc version runs in 2.0s, > with the NCG at 2.3s. I put the difference down to a bit of instruction > scheduling done by gcc, and that extra constant load. > > But let's face it, all of this code is crappy. It should be a tiny > little loop rather than a tail-call with argument passing, and that's > what we'll get with the new backend (eventually). LLVM probably won't > turn it into a loop on its own, that needs to be done before the code > gets passed to LLVM.
Agreed. Ideally the new backend would be (starting to be?) usable about the time -fvia-C dies? Otherwise there's always going to be something that gcc spots that the current codegen won't. Then again, killing perl from the ghc toolchain, and having a funeral/dancing on its grave, would be satisfying in itself :-) > Have you looked at this example on x86? It's *far* worse and runs about > 5 times slower. x86 scares me.. :) _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users