Micro-optimizing and the Dalvik JIT

Michael Comella Thu, 05 Feb 2015 11:25:16 -0800

Chris Kitching wrote a pretty cool bug comment about micro-optimizations,
how they're affected by the Dalvik JIT, and the resultant techniques for
benchmarking [1]. The text is reproduced below - thanks Chris!


Comment 22:































*(In reply to Michael Comella (:mcomella) from comment #21)> What do you
mean by incurs the wrath of the JIT? A link will suffice.>> What is an
optimal way to test a micro-optimized patch?The JIT looks to optimise
hotspots in code, typically just a couple of basic blocks at a time. Tight
loops which do something a million times for benchmarking are certainly
going to get its attention (as your program will be spending very nearly
all its time in that loop).The JIT is a (fairly poor in Android's case)
optimising compiler. It often notices that your test function is a waste of
time and optimises it out. You then find that your first three(ish)
iterations are slow, and the following 999,997 take no time at all (as
they're *just* the print statements).Someone once gave me an r- after doing
this. That was a fun conversation :P.Even if you avoid the hilarious case,
the JIT makes things harder to reason about. The JIT in Dalvik really likes
to process just a few basic blocks at a time (you can read a little more
about this here, bonus points for finding the
video:http://dl.google.com/googleio/2010/android-jit-compiler-androids-dalvik-vm.pdf
<http://dl.google.com/googleio/2010/android-jit-compiler-androids-dalvik-vm.pdf>
).Its tendency to do this is a bit of a pain when you have a big complex
system. It's likely the system doesn't *have* any nice hotspots for the JIT
to poke with a stick, but it'll probably have thousands of warm-ish bits.
These warm-ish bits won't be quite hot enough to get processed promptly, so
may be executed a very large number of times in interpreted mode (and may
never be processed at all). It's these warm-spots that benefit particularly
well from micro-optimisations, but since their warm-ish-ness isn't
preserved under benchmarking in isolation convincing reviewers you've not
lost your mind can be challenging.To make things even more exciting, the
HotSpot JVM usually processes an entire method at a time. If you go run the
system on a desktop JVM you'll find that as soon as the whole big function
becomes hot, the JIT will eat the whole thing, once again dwarfing your
micro-optimisation. You may be able to observe improvement here, but since
the HotSpot JIT is a really rather good optimising compiler (in contrast to
Dalvik's, which does very little optimisation), it's very likely that it
already did your optimisation, along with many many more, at JIT-time.Just
because an optimisation isn't interesting post-JIT isn't a reason to not do
it (unless it's hurting readability a lot or something), due to these
"warm-spots" I described earlier. In the presence of a method-granularity
JIT, particularly one with good optimisation like HotSpot's, such work is
less pointful.Sort-of sane approaches include:- Do something that causes
the whole shaboodle to be executed a large number of times, run this in an
instrumenting (not sampling!) profiler, and check the average execution
time for the small thing you improved (which will most likely be a number
in microseconds, which also upsets reviewers who aren't good at multiplying
by a million).- Run it on a desktop JVM with the JIT turned off (this is
correct for certain sorts of optimisations, but you now have to consider
the different behaviour of Dalvik vs. the desktop JVM. Different bytecode
instructions are differently expensive between the two platforms, which may
skew your results).- Run a large number of iterations first to *ensure*
it's been JIT'd, and then make your measurements. This is useful only if
you believe your optimisation affects the post-JIT performance. This is
more problematic than you might think, however, thanks to such absurdity on
some platforms as "dynamic deoptimisation". There's a rather good SO thread
discussing this topic
here:http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java
<http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java>That
thread also links to this rather fun
article:https://www.ibm.com/developerworks/java/library/j-jtp02225/
<https://www.ibm.com/developerworks/java/library/j-jtp02225/>Be mindful
though of the way these are mostly discussing desktop JVMs (HotSpot and
friends) which have radically different JIT characteristics to Dalvik. ART
is another kettle of fish entirely.- Check bytecode *length*. If you're
really stuck, the solution with the least bytecode is *probably* marginally
better, as the JIT finds them easier to swallow. (In general, some
particularly dense desktop JVMs have been known to give up in the face of
extremely large functions).- Appeal to theory.- Electrocute your
reviewer.It's a tricky one. I typically use the profiler approach I
mentioned in the event I actually want to measure it.Some things may be
sufficiently awkward to measure that you stop caring and just appeal to
theory (I do that a lot).*

Comment 23:
*I should also highlight the extremely excellent name for the "The JIT
optimised out your benchmark" problem: Heisenbenchmark.*

Comment 24 (rnewman):









*> It's a tricky one. I typically use the profiler approach I mentioned in
the> event I actually want to measure it.> Some things may be sufficiently
awkward to measure that you stop caring and> just appeal to theory (I do
that a lot).On this tangent: it's amusing that one of the first things
drummed into performance-oriented engineers is "don't trust your instincts;
profile, then optimize".When it comes down to the behavior of systems like
this in the large, you end up having to measure at incredibly coarse
granularity -- e.g., time to page load -- because attempting to measure
subsystems affects the experiment itself.At coarse granularity your
improvements disappear in the noise, and you end up having to trust to
instinct… or its more formal cousin, theory.*

(Ed: clipped for relevancy)

Comment 25 (ckitching):






*(In reply to Richard Newman [:rnewman] from comment #24)> On this tangent:
it's amusing that one of the first things drummed into>
performance-oriented engineers is "don't trust your instincts; profile,
then> optimize".You might enjoy this
article:http://www.joshbarczak.com/blog/?p=580
<http://www.joshbarczak.com/blog/?p=580>*

- Mike (:mcomella)

[1]: https://bugzilla.mozilla.org/show_bug.cgi?id=732177#c22

_______________________________________________
mobile-firefox-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/mobile-firefox-dev

Micro-optimizing and the Dalvik JIT

Reply via email to