On Tue, Apr 01, 2003 at 07:05:17PM +0200, Olaf Titz wrote:
> > The symptom of the heisenbug is having the uniquieId field modified so
> > that the second half is exactly the same as the first half.  Thorough
> > examination of any code that touches that field finds nothing that
> > even modifies it, much less copies over half the bits.  Our best idea
> > as to the cause is a JVM but in the JIT compiling that some JVMs do.
> > The fact that the corruption only happens occasionally after the node
> > has been running for a while makes even building a workaround (other
> > than what we've done already) all the much harder.
> 
> The reason is very clear to anyone who knows the Java language spec.
> In short, it's a race condition: long and double variable accesses are
> not atomic and need synchronization. Whether it really occurs is
> VM-specific but not a bug.

The problem with this is that AFAIK, the id is never modified once it
has been set in construction...
> 
> This is from a mail I sent to this list last year but apparently was
> swallowed:
> 
> > the request id, leaving only a JVM bug to blame.  The fact that this
> > bug occurs exclusively in nodes running IBM's JVM, and doesn't occur
> > when JIT compilation is disabled, forces us to conclude it's a problem
> > with the JVM.
> 
> And you are really sure that it's not a bug in the Java program?
> This is a long variable (long long does not exist in Java), and the
> JLS (17.4) specifically states that accesses to longs are not atomic
> and thus have to be synchronized.
> 
> Take the following example:
> 
> public class Race extends Thread
> {
>     static long cnt = 0L;
> 
>     static void update() {
>         cnt += 0x100000001L;
>     }
> 
>     public void run() {
>         for (int i=0; i<65536; ++i) {
>             update();
>         }
>         System.out.println(Long.toHexString(cnt));
>     }
> 
>     public static void main(String[] args) {
>         for (int i=0; i<1000; ++i) {
>             new Race().start();
>         }
>     }
> }
> 
> This should generate only "cnt" values with the high and low word
> equal. Which it does e.g. on this VM:
>  java version "1.2"
>  Classic VM (build Linux_JDK_1.2_pre-release-v2, native threads, sunwjit)
> but not on this one:
>  java version "1.4.1-beta"
>  Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1-beta-b14)
>  Java HotSpot(TM) Client VM (build 1.4.1-beta-b14, mixed mode)
> (both from Sun/Blackdown under Linux).
> 
> The accesses on the high and low 32-bit words of cnt may interleave
> between threads. This way it is possible for this variable to acquire
> garbage values even though it is only accessed by proper manipulations
> of the "long" value. Whether this corruption actually occurs is
> implementation specific, but it _may_ happen, and so the long
> variables have to be protected properly (e.g. making "update"
> synchronized in the above example or declaring all longs volatile).
> 
> The fact that almost all Heisenbug occurrences reported here are from
> the same two types of VM, one of which (Sun 1.4 under Linux) exhibits
> the unsynchronized behaviour, strongly suggests that this is indeed
> the reason.
> 
> In short, the bug is in fred. Finding it may be hard though.
> 
> Olaf
> 
> _______________________________________________
> devl mailing list
> [EMAIL PROTECTED]
> http://hawk.freenetproject.org:8080/cgi-bin/mailman/listinfo/devl
> 

-- 
Matthew Toseland
[EMAIL PROTECTED]/[EMAIL PROTECTED]
Full time freenet hacker.
http://freenetproject.org/
Freenet Distribution Node (temporary) at 
http://80-192-4-36.cable.ubr09.na.blueyonder.co.uk:8889/X3QtN7zNCUg/
ICTHUS.

Attachment: pgp00000.pgp
Description: PGP signature

Reply via email to