Hiho,

I am happy to see that I could encourage so many people to discuss about this topic to not only give me interesting answer to my questions but also to analyse and evaluate features and performance of D.

I have fixed the allocation performance problem via a custom destructor method which manually notifies the GC that its data is garbage so that the GC can free it in the next cycle and no longer have to determine if it is no longer used. This extremely speeded up the allocation tests but increased overall runtime performance by a very slight amount of time because memory is now freed contigeously.

@Nick Sabalausky:
Why should I remove .dub from the copy constructor? In my opinion this is important to keep both matrices (source and copy) independent from each other. The suggested COW feature for matrices sound interesting but also weird. I have to read more about that. Move semantics in D a needed and existing, however, I think that the D compiler isn't as good as the C++ compiler in determining what is moveable and what not.

Another thing which is hopefully a bug in the current language implementation of D is the strange behaviour of the transpose method with the only line of code:

return Matrix(this).transposeAssign();

In C++ for example this compiles without any problems and results in correct transposed matrix copies - this works due to the existance of move semantics in C++ and is one of the coolest features since C++11 which increased and simplified codes in many cases enormously for value types just as structs in D.

I have ran several tests in order to test when the move assign or the move constructors are called in D and whenever I expected them to be called the copy-constructor or the postblit was called instead which was frustating imo. Perhaps I still haven't quite understood the things behind the scenes in D but it can't be the solution to always copy the whole data whenever I could instead have a move of the whole data on the heap.

Besides that on suggestion which came up was that I could insert the Dimension module into the Matrix module as their are semantically working together. However, I find it better to have many smaller code snippets instead of fewer bigger ones and that's why I keep them both separated.

I also gave scoped imports a try and hoped that they were able to reduce my executable file and perhaps increase the performance of my program, none of which was true -> confused. Instead I now have more lines of code and do not see instantly what dependencies the module as itself has. So what is the point in scoped imports?

The mixin keyword is also nice to have but it feels similar to a dirty C-macro to me where text replacement with lower abstraction (unlike templates) takes place. Of course I am wrong and you will teach me why but atm I have strange feelings about implementing codes with mixins. In this special case: perhaps it isn't a wise decision to merge addition with subtraction and perhaps I can find faster ways to do that which invole more differences in both actions which requires to split both methods up again. (theoretical, but it could be)

Another weird thing is that the result ~= text(tabStr, this[r, c]) in the toString method is much slower than the two following lines of code:

result ~= tabStr;
result ~= to!string(this[r, c]);

Does anybody have an answer to this?

I am now thinking that I know the most important things about how to write ordinary (not super) efficient D code - laugh at me if you want :D - and will continue extending this benchmarking library and whenever I feel bad about a certain algorithm's performance I will knock here in this thread again, you know. :P

In the end of my post I just want to summarize the benchmark history for the matrix multiplication as I think that it is funny: (All tests for two 1000x1000 matrices!)

- The bloody first implementation of the matrix implementation (which worked) required about 39 seconds to finish. - Then I have finally found out the optimizing commands for the DMD and the multiplication performance double roughly to about 14 seconds. - I created an account here and due to your massive help the matrix multiplication required only about 5 seconds shortly after due to better const, pure and nothrow usage. - Through the shift from class to struct and some optimizations in memory layout and further usage improvements of const, pure and nothrow as well as several array feature usages and foreach loop the algorithm performance raised once again and required about 3,7 seconds. - The last notifiable optimization was the implemenatation based on pointer arithmentics which again improved the performance from 3,7 seconds to roughly about 2 seconds.

Due to this development I see that there is still a possibility that we could beat the 1,5 seconds from Java or even the 1,3 seconds from C++! (based on my machine: 64bit-archlinux, dualcore 2,2ghz, 4gb ram)

There are still many ways to further improve the performance. For examply by using LDC on certain hardwares, paralellism and perhaps by implementing COW with no GC dependencies. And of course I may miss many other possible optimization features of D.

I by myself can say that I have learn a lot and that's the most important thing above everything else for me here.

Thank you all for this very interesting conversation! You - the community - are a part what makes D a great language. =)

Robin

Reply via email to