While working on a small PR (https://github.com/dlang/phobos/pull/4420), I noticed that D's template computation system has horrific memory consumption (as well as being very slow).

I believe there are several reasons for this:

1) All template instantiations are memoized, even if they're just private internal implementation details. This causes the space complexity for recursive algorithms to generally be just as bad as the time complexity, whereas it should usually be much better.

2) Everything is immutable, which sometimes forces O(N) array algorithms to be replaced with O(N log(N)) tree algorithms. This compounds with (1).

3) Even though D *requires* that all template algorithms be recursive, the recursion limit is actually set very low (500). This forces efficient linear recursion O(N) algorithms to be replaced with wasteful O(N log(N)) binary recursion, unless N is guaranteed to be very small. This compounds with (1), and sometimes (2).

The combination of these issues causes very severe memory consumption and speed problems; as an example `staticSort!(aliasSeq!(iota(N)))`, which should have a time complexity of O(N log(N)) and a space complexity of O(N), instead seems to have a complexity of O(N^2 log(N)) for both time and space. This is awful - worse than any normal sorting algorithm, despite the fact that `staticSort` is based on the normally-very-efficient "merge sort" algorithm.

On my system, for N = 450 the sort takes about 300 ms and consumes 120 MB of memory. (With my pull request this is reduced to 80 ms and 35 MB, but that's still terrible.)

Ultimately, I believe it was a mistake for D to implement a separate, inferior programming language just for templates. However, it is too late to change that now (at least for D2), so I will offer some suggestions as to how memory consumption can be reduced within the current design:

A) Members of a template instantiation should be eagerly evaluated. Once a template has been fully evaluated, any private member which is not referenced by a public one should be deleted.

B) Template instantiations should be deleted after they are no longer accessible (even indirectly) via a top-level declaration.

C) The compiler should store `T...` in such a way that recursively appending N items to the beginning or end of an AliasSeq does not allocate more than O(N log(N)) additional memory. (O(N) is possible with more indirections.)

D) Implement tail recursion optimization for templates. Tail recursion should not count toward the recursion depth limit.

E) Consider eliminating the recursion limit entirely. Given that the template system is Turing Complete and mandates heavy use of recursion, there is no reason to think that a depth of 500 means something has gone wrong, any more than for a `while` loop running 500+ iterations. (Implementing this may require fixing the exponential name growth, though.)

Implementing A, B, and C should get D's template memory consumption under control. Implementing D and E will make template computation more flexible, encouraging people to use it more and find new things to complain about. :-P

Thoughts?

Reply via email to