On Friday, 15 September 2023 at 20:22:50 UTC, Atila Neves wrote:
An argument could be made that it could/should install the dependencies such that only one `-I` flag is needed.

Indeed, this would be god tier.

~190k SLOC (not counting the many dub dependencies) killed dmd on a system with 64GB RAM + 64GB swap after over a minute. Even if it worked, it'd be much, much slower.

What you do with the lines of code is *far* more important than how many there are.

The arsd library has about 219,000 lines of text if you delete the Windows-only and obsolete modules (doing so just so I can actually dmd *.d here on my Linux box). This includes comments and such; dscanner --sloc reports about 98,000.

$ wc *.d
<snip>
 218983  870208 7134770 total
$ dscanner --sloc *.d
<snip>
total:  98645

Let's compile it all:

$ /usr/bin/time dmd *.d -L-L/usr/local/pgsql/lib -unittest -L-lX11 5.35user 0.72system 0:06.08elapsed 99%CPU (0avgtext+0avgdata 1852460maxresident)k
0inputs+70464outputs (0major+536358minor)pagefaults 0swaps

That's a little bit slow, over 5 seconds. About 1.3 of those seconds are spent in the linker, the other 4 remain with dmd -c. Similarly, that's almost 2 GB of RAM it used, more than it probably should, but it worked fine.

My computer btw is a budget model circa 2016. Nothing extraordinary about its hardware.

But notice it isn't actually running out of RAM or melting the CPU over a period of minutes, despite being about six figures lines of code but any measure.


On the other hand, compile:

enum a = () {
   string s;
   foreach(i; 0 .. 20_000_000_000)
     s ~= 'a';
   return s;
}();


Don't actually do it, but you can imagine what will happen. 6 lines that can spin your cpu and explode your memory. Indeed, even just importing this module, even if the build system tried not to compile it again, will cause the same problem.

The arsd libs are written - for the most part, there's some exceptions - with compile speed in mind. If I see my build slow down, I investigate why. Most problems like this can be fixed!

In fact, let's take that snippet and talk about it. I had to remove *several* zeroes to make it even work without freezing up my computer, but with a 100,000 item loop, it just barely worked. Even 200,000 made it OOM.

But ok, a 100,000 item append:

0.53user 1.52system 0:02.17elapsed 95%CPU (0avgtext+0avgdata 4912656maxresident)k

About 5 GB of RAM devoured by these few lines, taking 2 seconds to run. What are some ways we can fix this? The ~= operator is actually *awful* at CTFE, its behavior is quadratic (...or worse, i didn't confirm this today, but it is obviously bad). So you can fix this pretty easily:

enum string a = () {
   // preallocate the buffer instead of append
   char[] s = new char[](100000);
   foreach(ref ch; s)
     ch = 'a';
   return s;
}();

0.17user 0.03system 0:00.21elapsed 98%CPU (0avgtext+0avgdata 45748maxresident)k 16inputs+1408outputs (0major+21995minor)pagefaults 0swaps

Over 10x faster to compile, 1/100th of the RAM, ram result. Real world code is frequently doing more than this example and rewriting it to work like this might take some real effort.... but the results are worth it.

And btw try this: import this module and check your time/memory stats. Even if it isn't compiled, since ctfe is run when the module is even just imported, you gain *nothing* by separate compilation!

...but there are times when you can gain a LOT by separate compilation in situations like this, if you can move the ctfe to be some private thing not exposed in the interface. This requires some work by the lib author too though in most cases. An example where you can gain a lot is when something does a lot of internal code generation but exposes a small interface, for example a scripting language wrapper (though script wrappers can also be made to compile reasonably efficiently if you use things like preallocation of buffers, keep your generated functions short (again, the codegen has quadratic behavior, so many small functions work better than a big one, and if you factor the code well, you can minimize the amount of generated code and call back to generic things, e.g. type erasure), collapse template instances, and keep ctfe things ctfe only with a variety of techniques, so they are not codegened unless they are actually necessary).

My arsd.script and arsd.cgi can wrap large numbers of functions and classes reasonably well, but that's why programs using them tend to be multi-second builds.... just note that's programs using them. Separate compiling the libraries doesn't help. You'd have to structure the code to keep those codegen parts internal to a package with a minimal interface, then separate compiling those internal components might win.

But this is a fairly niche case. Yes, I know there's one major commercial D user who do exactly this. But that's the exception, not the rule.


Reply via email to