So after the last day of DConf was over, back at the Aloft Brian Schott taught me and Jonathan Crapuchettes about perf (https://perf.wiki.kernel.org/index.php/Main_Page). If you're profiling binaries on Linux, this thing is a must have and I have no idea how I'd never heard about it before.
Anyway, I needed a program to test it out with, and since the last time I had something that didn't perform as well as I wanted it to and the dmd profiler didn't help me much was my MQTT broker, I used it on that. Now, that broker beat the other language implementations on my shootout on throughput by a country mile (http://atilanevesoncode.wordpress.com/2014/01/08/adding-java-and-c-to-the-mqtt-benchmarks-or-how-i-learned-to-stop-worrying-and-love-the-garbage-collector/) but it lagged behind the winner (Java) in throughput-constrained latency. I put perf to work on the D/vibe.d and Java versions and unsurprisingly it told me that the CPU was idle a lot more of the time for D than it was for Java. Digging in deeper, the main hotspot according to perf was pthread_mutex_lock. That surprised me since the program is single-threaded (and in fact runs slower when multiple threads are used). vibe.d turned out to be responsible for part of that. I posted on the vibe.d forum and Sonke promply removed all the pesky locking. Results improved, but still 6% behind Java. I carried on and removed unnecessary array allocations from my code. Before perf I had no idea these were contributing to slowing the program down. With that change, Java beats D by 3%. I've run out of optimisations to make (I think), and pthread_mutex_lock is still top of the list, called by _D4core4sync5mutex5Mutex4lockMFNeZv, which in turn is called by _D2gc2gc2GC6mallocMFmkPmZPv. The GC is preventing me from beating Java, but not because of collections. It's the locking it does to allocate instead! I don't know about the rest of you but I definitely didn't see that one coming. Atila
