Re: Optimize my code =)

Robin Tue, 18 Feb 2014 15:41:17 -0800

Hiho,

I am happy to see that I could encourage so many people todiscuss about this topic to not only give me interesting answerto my questions but also to analyse and evaluate features andperformance of D.

I have fixed the allocation performance problem via a customdestructor method which manually notifies the GC that its data isgarbage so that the GC can free it in the next cycle and nolonger have to determine if it is no longer used. This extremelyspeeded up the allocation tests but increased overall runtimeperformance by a very slight amount of time because memory is nowfreed contigeously.


@Nick Sabalausky:

Why should I remove .dub from the copy constructor? In my opinionthis is important to keep both matrices (source and copy)independent from each other. The suggested COW feature formatrices sound interesting but also weird. I have to read moreabout that.Move semantics in D a needed and existing, however, I think thatthe D compiler isn't as good as the C++ compiler in determiningwhat is moveable and what not.

Another thing which is hopefully a bug in the current languageimplementation of D is the strange behaviour of the transposemethod with the only line of code:


return Matrix(this).transposeAssign();

In C++ for example this compiles without any problems and resultsin correct transposed matrix copies - this works due to theexistance of move semantics in C++ and is one of the coolestfeatures since C++11 which increased and simplified codes in manycases enormously for value types just as structs in D.

I have ran several tests in order to test when the move assign orthe move constructors are called in D and whenever I expectedthem to be called the copy-constructor or the postblit was calledinstead which was frustating imo.Perhaps I still haven't quite understood the things behind thescenes in D but it can't be the solution to always copy the wholedata whenever I could instead have a move of the whole data onthe heap.

Besides that on suggestion which came up was that I could insertthe Dimension module into the Matrix module as their aresemantically working together. However, I find it better to havemany smaller code snippets instead of fewer bigger ones andthat's why I keep them both separated.

I also gave scoped imports a try and hoped that they were able toreduce my executable file and perhaps increase the performance ofmy program, none of which was true -> confused. Instead I nowhave more lines of code and do not see instantly whatdependencies the module as itself has. So what is the point inscoped imports?

The mixin keyword is also nice to have but it feels similar to adirty C-macro to me where text replacement with lower abstraction(unlike templates) takes place. Of course I am wrong and you willteach me why but atm I have strange feelings about implementingcodes with mixins. In this special case: perhaps it isn't a wisedecision to merge addition with subtraction and perhaps I canfind faster ways to do that which invole more differences in bothactions which requires to split both methods up again.(theoretical, but it could be)

Another weird thing is that the result ~= text(tabStr, this[r,c]) in the toString method is much slower than the two followinglines of code:


result ~= tabStr;
result ~= to!string(this[r, c]);

Does anybody have an answer to this?

I am now thinking that I know the most important things about howto write ordinary (not super) efficient D code - laugh at me ifyou want :D - and will continue extending this benchmarkinglibrary and whenever I feel bad about a certain algorithm'sperformance I will knock here in this thread again, you know. :P

In the end of my post I just want to summarize the benchmarkhistory for the matrix multiplication as I think that it isfunny: (All tests for two 1000x1000 matrices!)

- The bloody first implementation of the matrix implementation(which worked) required about 39 seconds to finish.- Then I have finally found out the optimizing commands for theDMD and the multiplication performance double roughly to about 14seconds.- I created an account here and due to your massive help thematrix multiplication required only about 5 seconds shortly afterdue to better const, pure and nothrow usage.- Through the shift from class to struct and some optimizationsin memory layout and further usage improvements of const, pureand nothrow as well as several array feature usages and foreachloop the algorithm performance raised once again and requiredabout 3,7 seconds.- The last notifiable optimization was the implemenatation basedon pointer arithmentics which again improved the performance from3,7 seconds to roughly about 2 seconds.

Due to this development I see that there is still a possibilitythat we could beat the 1,5 seconds from Java or even the 1,3seconds from C++! (based on my machine: 64bit-archlinux, dualcore2,2ghz, 4gb ram)

There are still many ways to further improve the performance. Forexamply by using LDC on certain hardwares, paralellism andperhaps by implementing COW with no GC dependencies. And ofcourse I may miss many other possible optimization features of D.

I by myself can say that I have learn a lot and that's the mostimportant thing above everything else for me here.

Thank you all for this very interesting conversation! You - thecommunity - are a part what makes D a great language. =)


Robin

Re: Optimize my code =)

Reply via email to