Robin:

the existance of move semantics in C++ and is one of the coolest features since C++11 which increased and simplified codes in many cases enormously for value types just as structs in D.

I guess Andrei doesn't agree with you (and move semantics in C++11 is quite hard to understand).


I also gave scoped imports a try and hoped that they were able to reduce my executable file and perhaps increase the performance of my program, none of which was true -> confused. Instead I now have more lines of code and do not see instantly what dependencies the module as itself has. So what is the point in scoped imports?

Scoped imports in general can't increase performance. Their main point is to avoid importing modules that are needed only by templated code. So if you don't instantiate the template, the liker works less and the binary is usually smaller (no moduleinfo, etc).


Another weird thing is that the result ~= text(tabStr, this[r, c]) in the toString method is much slower than the two following lines of code:

result ~= tabStr;
result ~= to!string(this[r, c]);

Does anybody have an answer to this?

It doesn't look too much weird. In the first case you are allocating and creating larger strings. But I don't think matrix printing is a bottleneck in a program.


- Then I have finally found out the optimizing commands for the DMD

This is a small but common problem. Perhaps worth fixing.


There are still many ways to further improve the performance. For examply by using LDC

Latest stable and unstable versions of LDC2, try it:
https://github.com/ldc-developers/ldc/releases/tag/v0.12.1
https://github.com/ldc-developers/ldc/releases/tag/v0.13.0-alpha1


on certain hardwares, paralellism and perhaps by implementing COW with no GC dependencies. And of course I may miss many other possible optimization features of D.

Matrix multiplication can be improved a lot tiling the matrix (or better using a cache oblivious algorithm), using SSE/AVX2, using multiple cores, etc. As starting point you can try to use std.parallelism. It could speed up your code on 4 cores with a very limited amount of added code.

Bye,
bearophile

Reply via email to