On Friday, 21 August 2015 at 09:17:28 UTC, Iain Buclaw wrote:
There's a paper somewhere about optimisations on Intel processors that says that -O2 produces overall better results than -O3 (I'll have to dig it out).
That being said, recently I compared performance of the datetime library using different algorithms. One function of interest was computing year from raw time: D1 had an implementation based on loop - it iterated over years until it matched the source raw time; and currently phobos has implementation without loop, which carefully reduces the time to year. I wrote two tests iterating over days and calling date-time conversion functions, the test which invoked yearFromDays directly showed that implementation without loop is faster, but the bigger test that called full conversion between date and time showed that version with loop is faster by 5%. Quite unintuitive. Could it be due to cache problems? The function with loop is smaller, but the whole executable is only 15kb - should fit in the processor cache entirely.
