On Saturday, 11 May 2019 at 00:32:54 UTC, H. S. Teoh wrote:
When it comes to performance, I've essentially given up looking
at DMD output. DMD's inliner gives up far too easily, leading
to a lot of calls that aren't inlined when they really should
be, and DMD's optimizer does not have loop unrolling, which
excludes a LOT of subsequent optimizations that could have been
applied. I wouldn't base any performance decisions on DMD
output. If LDC or GDC produces non-optimal code, then we have
cause to do something. Otherwise, IMO we're just uglifying D
code and making it unmaintainable for no good reason.
I think this thread is beginning losing sight of the larger
picture. What I'm trying to achieve is the opt-in continuum that
Andrei mentioned elsewhere on this forum. We can't do that with
the way the compiler and runtime currently interact. So, the
first task, which I'm trying to get around to, is to convert
runtime hooks to templates. Using the compile-time type
information will allow us to avoid `TypeInfo`, therefore classes,
therefore the entire D runtime. We're now much closer to the
opt-in continuum Andrei mentioned previously on this forum. Now
let's assume that's done...
Those new templates will eventually call a very few functions
from the C standard library, memcpy being one of them. Because
the runtime hooks are now templates, we have type information
that we can use in the call to memcpy. Therefore, I want to
explore implementing `void memcpy(T)(ref T dst, const ref T src)
@safe, nothrow, pure, @nogc` rather than `void* memcpy(void*,
const void*, size_t)` There are some issues here such as
template bloat and compile times, but I want to explore it
anyway. I'm trying to imagine, what would memcpy in D look like
if we didn't have a C implementation clouding narrowing our
imagination. I don't know how that will turn out, but I want to
explore it.
For LDC we can just do something like this...
void memcpy(T)(ref T dst, const ref T src) @safe, nothrow, @nogc,
pure
{
version(LDC)
{
// after casting dst and src to byte arrays...
// (probably need to put the casts in a @trusted block)
for(int i = 0; i < size; i++)
dstArray[i] = srcArry[i];
}
}
LDC is able to see that as memcpy and do the right thing. Also
if the LDC developers want to do their own thing altogether, more
power to them. I don't see anything ugly about it.
However, DMD won't do the right thing. I guess others are
thinking that we'd just re-implement `void* memcpy(void*, const
void*, size_t)` in D and we'd throw in a runtime call to
`memcpy(&dstArray[0], &srcArray[0], T.sizeof())`. That's
ridiculous. What I want to do is use the type information to
generate an optimal implementation (considering size and
alignment) that DMD will be forced to inline with
`pragma(inline)` That implementation can also take into
consideration target features such as SIMD. I don't believe the
code will be complex, and I expect it to perform at least as well
as the C implementation. My initial tests show that it will
actually outperform the C implementation, but that could be a
problem with my tests. I'm still researching it.
Now assuming that's done, we now have language runtime
implementations that are isolated from heavier runtime features
(like the `TypeInfo` classes) that can easily be used in -betterC
builds, bare-metal systems programming, etc. simply by importing
them as a header-only library; it doesn't require first compiling
(or cross-compiling) a runtime for linking with your program; you
just import and go. We're now much closer to the opt-in
continuum.
Now what about development of druntime itself. Well wouldn't it
be nice if we could utilize things like `std.traits`, `std.meta`,
`std.conv`, and a bunch of other stuff from Phobos? Wouldn't it
also be nice if we could use that stuff in DMD itself without
importing Phobos? So let's take that stuff in Phobos that
doesn't need druntime and put them in a library that doesn't
require druntime (i.e. utiliD). Now druntime can import utiliD
and have more idiomatic-D implementations.
But the benefits don't stop there, bare-metal developers,
microcontroller developers, kernel driver developers, OS
developers, etc... can all use the runtime-less library to
bootstap their own implementations without having to re-invent or
copy code out of Phobos and druntime.
I'm probably not articulating this vision well. I'm sorry.
Maybe we'll just have to hope I can find the time and energy to
do it myself and then others will finally see from the results.
Or maybe I'll go have a nice helping of crow.
Mike