On Tue, Apr 01, 2003 at 07:05:17PM +0200, Olaf Titz wrote: > > The symptom of the heisenbug is having the uniquieId field modified so > > that the second half is exactly the same as the first half. Thorough > > examination of any code that touches that field finds nothing that > > even modifies it, much less copies over half the bits. Our best idea > > as to the cause is a JVM but in the JIT compiling that some JVMs do. > > The fact that the corruption only happens occasionally after the node > > has been running for a while makes even building a workaround (other > > than what we've done already) all the much harder. > > The reason is very clear to anyone who knows the Java language spec. > In short, it's a race condition: long and double variable accesses are > not atomic and need synchronization. Whether it really occurs is > VM-specific but not a bug.
The problem with this is that AFAIK, the id is never modified once it
has been set in construction...
>
> This is from a mail I sent to this list last year but apparently was
> swallowed:
>
> > the request id, leaving only a JVM bug to blame. The fact that this
> > bug occurs exclusively in nodes running IBM's JVM, and doesn't occur
> > when JIT compilation is disabled, forces us to conclude it's a problem
> > with the JVM.
>
> And you are really sure that it's not a bug in the Java program?
> This is a long variable (long long does not exist in Java), and the
> JLS (17.4) specifically states that accesses to longs are not atomic
> and thus have to be synchronized.
>
> Take the following example:
>
> public class Race extends Thread
> {
> static long cnt = 0L;
>
> static void update() {
> cnt += 0x100000001L;
> }
>
> public void run() {
> for (int i=0; i<65536; ++i) {
> update();
> }
> System.out.println(Long.toHexString(cnt));
> }
>
> public static void main(String[] args) {
> for (int i=0; i<1000; ++i) {
> new Race().start();
> }
> }
> }
>
> This should generate only "cnt" values with the high and low word
> equal. Which it does e.g. on this VM:
> java version "1.2"
> Classic VM (build Linux_JDK_1.2_pre-release-v2, native threads, sunwjit)
> but not on this one:
> java version "1.4.1-beta"
> Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1-beta-b14)
> Java HotSpot(TM) Client VM (build 1.4.1-beta-b14, mixed mode)
> (both from Sun/Blackdown under Linux).
>
> The accesses on the high and low 32-bit words of cnt may interleave
> between threads. This way it is possible for this variable to acquire
> garbage values even though it is only accessed by proper manipulations
> of the "long" value. Whether this corruption actually occurs is
> implementation specific, but it _may_ happen, and so the long
> variables have to be protected properly (e.g. making "update"
> synchronized in the above example or declaring all longs volatile).
>
> The fact that almost all Heisenbug occurrences reported here are from
> the same two types of VM, one of which (Sun 1.4 under Linux) exhibits
> the unsynchronized behaviour, strongly suggests that this is indeed
> the reason.
>
> In short, the bug is in fred. Finding it may be hard though.
>
> Olaf
>
> _______________________________________________
> devl mailing list
> [EMAIL PROTECTED]
> http://hawk.freenetproject.org:8080/cgi-bin/mailman/listinfo/devl
>
--
Matthew Toseland
[EMAIL PROTECTED]/[EMAIL PROTECTED]
Full time freenet hacker.
http://freenetproject.org/
Freenet Distribution Node (temporary) at
http://80-192-4-36.cable.ubr09.na.blueyonder.co.uk:8889/X3QtN7zNCUg/
ICTHUS.
pgp00000.pgp
Description: PGP signature
