2011/2/1 Elazar Leibovich <elaz...@gmail.com>

>
> See this paper[2] which is referred in the slides.
>

> [2] http://www-plan.cs.colorado.edu/diwan/asplos09.pdf

This morning I took part in a meeting here at work where something related
was discussed, so I am hot for the topic. Apologies all around.

The paper you reference focuses on the so-called "measurement bias". The
authors refer to this term as "well known to medical and other sciences". On
another page they mention social sciences, too. The real meaning of
"measurement bias" is "sloppy experimentation where external factors are not
properly taken into account or neutralized, rendering results
irreproducible." Real experimental sciences like physics know the phenomenon
very well and do not consider it a necessary evil. From my point of view,
the title of the paper should have been "When you do obviously wrong things
your results should not surprise you."

I am not saying the paper is worthless, on the contrary, it is certainly
useful to draw attention to the fact that you need to know what you measure
and what affects your measurement, especially in a culture that lacks such
awareness. It's a far cry from saying that computers are "not deterministic"
though. If you want to measure A and in fact you measure something else and
that something else depends on various factors that you are unable to
control, it is *not* a law of Nature.

Uncontrolled external factors may affect the correctness of your program and
not just performance. The meeting I took part in this morning touched on the
following issue: the details of the environment and the setup of the build
machine are not a part of the code that is built on it, are not controlled
in the same manner, etc. As a result, different builds of exactly the same
code need to be carefully checked by QA because they may behave differently
(and their performance may differ). Big operational problem that intuitively
very few people expect (same code - why should it be re-tested?). The
problem is, however, not an intrinsic lack of determinism but the presence
of factors that are (erroneously) considered external and are therefore
poorly controlled.

The presentation that you sent highlights another issue: you cannot control
complicated 3rd party software stacks. You wrote some java code that needs a
JVM to run, and the latter has "service" threads that do things like GC, JIT
compilation, serialization/deserialization, marshalling/unmarshalling, etc.
The JVM itself is scheduled, these service threads are scheduled, they
happen at different times from run to run, affecting optimization,
concurrency, all sorts of other things. From the point of view of a typical
java programmer these are uncontrollable factors. Are they unavoidable? Not
at all. If you need a certain level of control choose technologies that
allow it.

Say, you do a lot of messaging. You should realize that java, for instance,
does not allow you much control of memory allocation and as a result
serialization/deserialization is unavoidable (of course, the marketoids only
stress that java does it automatically, and do not dwell of the question why
it needs serialization in the first place). This will likely hit your
performance and predictability. If you choose, e.g., C/C++ and good
networking technologies you can control memory allocation, and therefore
serialization, much better (and even avoid serialization altogether).

-- 
Oleg Goldshmidt | o...@goldshmidt.org
_______________________________________________
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il

Reply via email to