On Fri, 30 Dec 2011 06:51:44 +0100, Vladimir Panteleev
<[email protected]> wrote:
On Thursday, 29 December 2011 at 19:47:39 UTC, Walter Bright wrote:
On 12/29/2011 3:19 AM, Vladimir Panteleev wrote:
I'd like to invite you to translate Daniel Vik's C memcpy
implementation to D:
http://www.danielvik.com/2010/02/fast-memcpy-in-c.html
Challenge accepted.
Ah, a direct translation using functions! This is probably the most
elegant approach, however - as I'm sure you've noticed - the programmer
has no control over what gets inlined.
Examining the assembler output, it inlines everything except
COPY_SHIFT, COPY_NO_SHIFT, and COPY_REMAINING. The inliner in dmd could
definitely be improved, but that is not a problem with the language,
but the implementation.
This is the problem with heuristic inlining: while great by itself, in a
position such as this the programmer is left with no choice but to
examine the assembler output to make sure the compiler does what the
programmer wants it to do. Such behavior can change from one
implementation to another, and even from one compiler version to
another. (After all, I don't think that we can guarantee that what's
inlined today, will be inlined tomorrow.)
For real performance bottlenecks one should always examine the assembly.
For most code inlining hardly ever matters for the runtime of your
program and focusing on efficient algorithms is most important.
What really baffles me is that people want control over inlining
but nobody seems to ever have noticed that x64 switch doesn't switch
and x64 vector ops aren't vectorized. Both of which are really
important in performance sensitive code.
Continuing in that vein, please note that neither C nor C++ require
inlining of any sort. The "inline" keyword is merely a hint to the
compiler. What inlining takes place is completely implementation
defined, not language defined.
I think we can agree that the C inline hint is of limited use. However,
major C compiler vendors implement an extension to force inlining.
Generally, I would say that common vendor extensions seen in other
languages are an opportunity for D to avoid a similar mess: such
extensions would not have to be required to be implemented, but when
they are, they would use the same syntax across implementations.
I wish to note that the D version semantically accomplishes the same
thing as the C version without using mixins or CTFE - it's all
straightforward code, without the abusive preprocessor tricks.
I don't think there's much value in that statement. After all, except
for a few occasional templates (which weren't strictly necessary), your
translation uses few D-specific features. If you were to leave yourself
at the mercy of a C compiler's optimizer, your rewrite would merely be a
testament against C macros, not the power of D.
However, the most important part is: this translation is incorrect. C
macros in the original code provide a guarantee that the code is
inlined. D cannot make such guarantees - even your amended version is
tuned to one specific implementation (and possibly, only a specific
range of versions of it).