bearophile wrote:
An interesting thread on Reddit:
http://www.reddit.com/r/programming/comments/92tnd/concepts_removed_from_c0x/

Related to the compilation and use of generic code, various languages work in 
different ways.
D/C++ (and ShedSkin, using full type inference) compile a different function 
for each different set of template types.

Java uses a single function, because even if it now has generics, at run time 
there's a single function that uses Objects.

[...]
In theory D strategy is the one that leads to faster code at runtime (and 
doesn't require a VM like the C# case), but the binary has to contain all the 
compiled versions of a templated function, this can grow the binary size a lot. 
A big binary can be a problem also because there's more code that comes through 
the half-L1 cache (just 32 KB), this can and do reduce performance.

I can see two possible solutions to this problem:
- Many functions in D programs aren't performance-critical. If you "compile" 
one of such templated functions into a single dynamically typed function the performance 
of the code doesn't change, while the binary size is kept low. I think this is what the 
JavaHot spot does. LLVM that's behind LDC may be able to do this, and it can also 
probably allow to instantiate templates at run-time, but this looks more fit for a 
VM-based language, something different from the current D2 design (because it's generally 
not easy to know at compile time what template must be compiled and what one can be 
implemented with a dynamically typed piece of code, you need a profiled-guided 
compilation, or a VM that monitors the code at runtime like in the Java/C# case).
- Another possible solution is to find shared equal asm code in different 
functions coming from a single template function, and put such pieces of code 
only one time inside the compiled binary. This requires to split the functions 
in sections, and such sections have to be joined by nonconditional jumps. You 
may think such jumps slow down the code (and sometimes they surely slow down 
code, so you have to avoid juping in the middle of inner loops). Time ago I 
have read an article about this, it shows that this strategy also reduces 
binary size, and this reduces cache misses in L1-code enough to usually balance 
the slowdown caused by jumps (in many examples there was even a performance 
gain).

I commonly find that I need to use templates for integral types, and that's only because of the silly implicit casting rules from C.
For example, initially I write
foo(long x) { ... }
and this covers long, int, uint, short, ushort, byte, ubyte.

But ulong needs to be treated specially. So I add:
foo(ulong x) { ... }

But now, thanks to the silly implicit casting rules, foo(1) is now ambiguous! So I need to add overloads for all of the other integral types. That's ridiculous, so I make foo() a template. And I get code bloat.

If there was a way to say:
foo(explicit ulong x) { ... } // MUST really be ulong, don't allow implicit integral conversions

then a huge fraction of the code bloat templates would disappear.

(for 'explicit', substitute any keyword of your choice).

Reply via email to