On Friday, 1 April 2016 at 14:46:42 UTC, Johan Engelen wrote:

Meanwhile, I've implemented hashing of function names and other symbols *for the backend*, giving an object file size reduction of ~25% (hashing everything larger than 100 chars) for my current testcase (251MB -> 189MB). Hashing symbols in the FE is not possible with my testcase because of std.traits.ParameterStorageClassTuple... :/

See my PR for LDC:
https://github.com/ldc-developers/ldc/pull/1445

"This adds MD5 hashing of symbol names that are larger than threshold set by -hashthres.

What is very unfortunate is that std.traits depends on the mangled name, doing string parsing of the mangled name of symbols to obtain symbol traits. This means that mangling cannot be changed (dramatically, like hashing) at a high level, and the hashing has to be done on a lower level.

Hashed symbols look like this:
_D3one3two5three3L3433_46a82aac733d8a4b3588d7fa8937aad66Result3fooZ
ddemangle gives:
one.two.three.L34._46a82aac733d8a4b3588d7fa8937aad6.Result.foo
Meaning: this symbol is defined in module one.two.three on line 34. The identifier is foo and is contained in the struct or class Result.

Symbols that may be hashed:
- functions
- struct/class initializer
- vtable
- typeinfo (needed surgery inside FE code)

The feature is experimental, and has been tested on Weka.io's codebase. Compilation with -hashthres=1000 results in a binary that is half the size of the original (201MB vs. 461MB). I did not observe a significant difference in total build times. Hash threshold of 8000 gives 229MB, 800 gives 195MB binary size: there is not much gain after a certain hash threshold. Linking Weka's code fails with a threshold of 500: phobos contains a few large symbols (one larger than 8kb!) and this PR currently does not disable hashing of symbols that are inside phobos, hence "experimental". Future work could try to figure out whether a symbol is inside phobos or not."

Reply via email to