Sergey Gromov:
> Remove -inline from your compiler options, and #2 compiles and runs
> faster in both D1 and D2 than #1.
> lazy seems to do something funny when -inline is in effect.

You are right, I have tested it on D1.
I think the codepad doesn't use -inline, that's why #2 works there.
#2 also uses less RAM on D1, for example N=24 requires about 707 MB instead of 
788 MB, and about 2.8 s instead of about 3 s.

Sergey Gromov:
> Remove -inline from your compiler options, and #2 compiles and runs
> faster in both D1 and D2 than #1.
> lazy seems to do something funny when -inline is in effect.

You are right, I have tested it on D1.
I think the codepad doesn't use -inline, that's why #2 works there.

#2 also uses less RAM on D1, for example N=24 requires about 710 MB instead of 
788 MB, and about 2.8 s instead of about 3 s.

This is the Asm of the #2 compiled with D1 with -O -release, it's shorter still 
(but note there are some other parts that I don't show here):

_D11man_or_boy21aFiLiLiLiLiLiZi comdat
        assume  CS:_D11man_or_boy21aFiLiLiLiLiLiZi
L0:             push    EAX
                push    EBX
                cmp     dword ptr 034h[ESP],0
                jg      L33
                mov     EAX,014h[ESP]
                mov     EDX,018h[ESP]
                mov     EBX,014h[ESP]
                call    EDX
                push    EAX
                sub     ESP,4
                mov     EAX,014h[ESP]
                mov     EDX,018h[ESP]
                mov     EBX,014h[ESP]
                call    EDX
                mov     ECX,EAX
                add     ESP,4
                pop     EAX
                add     EAX,ECX
                jmp short       L3C
L33:            lea     EAX,4[ESP]
                call    near ptr _D11man_or_boy21aFiLiLiLiLiLiZi1bMFZi
L3C:            pop     EBX
                pop     ECX
                ret     02Ch
_D11man_or_boy21aFiLiLiLiLiLiZi ends

--------------------------

I have then tested #1 and #2 without -inline on D2, and the results are very 
different from each other: #1 is very slow and uses lot of memory, while #2 
(that contains no scope) acts as D1, using "only" 707 MB with N=24 and working 
with n=25 too. The asm code is similar the one I have just shown here.
So compiling #2 witout -inline in D2 fulfulls my original desire of computing 
up to N=25 with D2 :-)

I presume the -inline uncovers a small bug of DMD, that will be fixed. But what 
interests me more now is to understand how to write such fast code in general 
in D2.

Bye,
bearophile

Reply via email to