Re: Article: Increasing the D Compiler Speed by Over 75%

Dmitry Olshansky Fri, 02 Aug 2013 06:21:13 -0700

31-Jul-2013 22:20, Walter Bright пишет:

On 7/31/2013 8:26 AM, Dmitry Olshansky wrote:

Ouch... to boot it's always aligned by word size, so
key % sizeof(size_t) == 0
...
rendering lower 2-3 bits useless, that would make straight slice lower
bits
approach rather weak :)


Yeah, I realized that, too. Gotta shift it right 3 or 4 bits.

And that helped a bit... Anyhow after doing a bit more pervasive integerhash power of 2 tables stand up to their promise.

The pull that reaps the minor speed benefit over the original (~2% speedgain!):

https://github.com/D-Programming-Language/dmd/pull/2436

Not bad given that _aaGetRValue takes only a fraction of time itself.

I failed to see much of any improvement on Win32 though, allocations aredominating the picture.

And sharing the joy of having a nice sampling profiler, here is what AMDCodeAnalyst have to say (top X functions by CPU clocks not halted).


Original DMD:

Function         CPU clocks      DC accesses     DC misses
RTLHeap::Alloc   49410   520     3624
Obj::ledata      10300   1308    3166
Obj::fltused     6464    3218    6
cgcs_term        4018    1328    626
TemplateInstance::semantic       3362    2396    26
Obj::byte        3212    506     692
vsprintf         3030    3060    2
ScopeDsymbol::search     2780    1592    244
_pformat         2506    2772    16
_aaGetRvalue     2134    806     304
memmove  1904    1084    28
strlen   1804    486     36
malloc   1282    786     40
Parameter::foreach       1240    778     34
StringTable::search      952     220     42
MD5Final         918     318    

Variation of DMD with pow-2 tables:

Function         CPU clocks      DC accesses     DC misses
RTLHeap::Alloc   51638   552     3538
Obj::ledata      9936    1346    3290
Obj::fltused     7392    2948    6
cgcs_term        3892    1292    638
TemplateInstance::semantic       3724    2346    20
Obj::byte        3280    548     676
vsprintf         3056    3006    4
ScopeDsymbol::search     2648    1706    220
_pformat         2560    2718    26
memcpy   2014    1122    46
strlen   1694    494     32
_aaGetRvalue     1588    658     278
Parameter::foreach       1266    658     38
malloc   1198    758     44
StringTable::search      970     214     24
MD5Final         866     274     2

This underlies the point that DMC RTL allocator is the biggest speeddetractor. It is "followed" by ledata (could it be due to linear searchinside?) and surprisingly the tiny Obj::fltused is draining lots ofcycles (is it called that often?).


--
Dmitry Olshansky

Re: Article: Increasing the D Compiler Speed by Over 75%

Reply via email to