I work all day with C++ optimization and deal closely with the Intel compiler, here is what I have to say. I agree with all points but I think 1, 3 and 7 are slightly innacurate.

1. D knows when data is immutable. C has to always make worst case assumptions, and assume indirectly accessed data mutates.

ICC (and other C++ compilers) has plenty of way to disambiguate aliasing:
- a pragma to let the optimizer assume no loop dependency
- restrict keyword
- /Qalias-const: assumes a parameter of type pointer-to-const does not alias with a parameter of type pointer-to-non-const.
- GCC-like strict aliasing rule

In most case I've seen, the "no loop dependency" pragma is downright spectacular and gives the most bang for the bucks. Every other methods is annoying and barely useful in comparison.

It's not clear to me which aliasing rules D assume.

3. Function inlining has generally been shown to be of tremendous value in optimization. D has access to all the source code in the program, or at least as much as you're willing to show it, and can inline across modules. C cannot inline functions unless they appear in the same module or in .h files. It's a rare practice to push many functions into .h files. Of course, there are now linkers that can do whole program optimization for C, but those are kind of herculean efforts to work around that C limitation of being able to see only one module at a time.

This point is not entirely accurate. While the C model is generally harmful with inlining, with the Intel C++ compiler you can absolutely rely on cross-module inlining when doing global optimization. I don't know how it works, but all out tiny functions hidden in separate translation units get inlined. ICC also provide 4 very useful pragmas for optimization: {forcing|not forcing} inlining [recursively] at call-point, instead of definition point. I find them better than any inline/__force_inline at definition point.

7. D's "final switch" enables more efficient switch code generation, because the default doesn't have to be considered.

A good point.
The default: branch can be marked unreachable with most C++ compilers I know of. People don't do it though. In my experience, ICC performs sufficient static analysis to be able to avoid the switch prelude test. I don't like it, since it is not desirable for reliable optimization.

Would be amazing to have the ICC backend work with a D front-end :)
It kicked my ass so many times.

Reply via email to