Re: Scope storage class

bearophile Wed, 26 Nov 2008 15:20:25 -0800

Sergey Gromov:
> Remove -inline from your compiler options, and #2 compiles and runs
> faster in both D1 and D2 than #1.
> lazy seems to do something funny when -inline is in effect.


You are right, I have tested it on D1.
I think the codepad doesn't use -inline, that's why #2 works there.
#2 also uses less RAM on D1, for example N=24 requires about 707 MB instead of 
788 MB, and about 2.8 s instead of about 3 s.

Sergey Gromov:
> Remove -inline from your compiler options, and #2 compiles and runs
> faster in both D1 and D2 than #1.
> lazy seems to do something funny when -inline is in effect.

You are right, I have tested it on D1.
I think the codepad doesn't use -inline, that's why #2 works there.

#2 also uses less RAM on D1, for example N=24 requires about 710 MB instead of 
788 MB, and about 2.8 s instead of about 3 s.

This is the Asm of the #2 compiled with D1 with -O -release, it's shorter still 
(but note there are some other parts that I don't show here):

_D11man_or_boy21aFiLiLiLiLiLiZi comdat
        assume  CS:_D11man_or_boy21aFiLiLiLiLiLiZi
L0:             push    EAX
                push    EBX
                cmp     dword ptr 034h[ESP],0
                jg      L33
                mov     EAX,014h[ESP]
                mov     EDX,018h[ESP]
                mov     EBX,014h[ESP]
                call    EDX
                push    EAX
                sub     ESP,4
                mov     EAX,014h[ESP]
                mov     EDX,018h[ESP]
                mov     EBX,014h[ESP]
                call    EDX
                mov     ECX,EAX
                add     ESP,4
                pop     EAX
                add     EAX,ECX
                jmp short       L3C
L33:            lea     EAX,4[ESP]
                call    near ptr _D11man_or_boy21aFiLiLiLiLiLiZi1bMFZi
L3C:            pop     EBX
                pop     ECX
                ret     02Ch
_D11man_or_boy21aFiLiLiLiLiLiZi ends

--------------------------

I have then tested #1 and #2 without -inline on D2, and the results are very 
different from each other: #1 is very slow and uses lot of memory, while #2 
(that contains no scope) acts as D1, using "only" 707 MB with N=24 and working 
with n=25 too. The asm code is similar the one I have just shown here.
So compiling #2 witout -inline in D2 fulfulls my original desire of computing 
up to N=25 with D2 :-)

I presume the -inline uncovers a small bug of DMD, that will be fixed. But what 
interests me more now is to understand how to write such fast code in general 
in D2.

Bye,
bearophile

Re: Scope storage class

Reply via email to