My question now is, why is each mutex based thread safe variant so slow
compared to a similar java program? The only hint could be something
https://blogs.oracle.com/dave/entry/java_util_concurrent_reentrantlock_vs  that
mentions, that there is some magic going on underneath.
For the atomic and the non thread safe variant, the d solution seems to
be twice as fast as my java program, for the locked variant, the java
program seems to be 40 times faster?

btw. I run the code with dub run --build=release

Can you post your timings (both D and Java)? And can you post your java code?

