I ran your test and measured what is allocated for a couple of individual
allocations, and then looked at the ptr offsets at the end.
I see 32 bytes being allocated for each new, but at the end I see over 100MB
offset in the ptr range.
If I try with new char[12], which would be the storage needed for sign byte
plus base 256 data, the allocations are 16 bytes with the same test, and around
69MB offsets in the ptr range for all the allocations.
So ... looks to me the char array implementation would would be a significant
improvement in terms of required memory for this particular case.
Here is the code
auto data = new uint[][2_000_000];
foreach (i; 0 .. data.length)
data[i] = new uint[4];
auto p0 = data[0].ptr;
auto sz0 = GC.sizeOf(p0);
auto p1 = data[1].ptr;
auto sz1 = GC.sizeOf(p1);
auto pn = data[1999999].ptr;
writeln("p0=",p0 ," sz0=",sz0," p1=",p1," sz1=",sz1," pn=", pn);