On Wednesday 28 February 2007 23:28 Weldon Washburn wrote: > On 2/28/07, Gregory Shimansky <[EMAIL PROTECTED]> wrote: > > Weldon Washburn wrote: > > > On second thought, the only way I know to implement volatile long > > > > (64-bit) > > > > > Java variables on ia32 is: > > > > > > grab critical section > > > mov [ecx], low32bits; // to do a write, the code for doing a read is > > > similar > > > mov[ecx+4], hi32bits; > > > release critical section > > > > Is it possible for 64-bit atomic load stores to use double load/stores > > hmm... can you tell us the specific instructions you are suggesting? I see > quad loads/stores but can't find the double load/store version. I also > tried to find the guarantees on bus transactions. Somewhere I recall it is > documented that 4-byte aligned loads/stores are guaranteed to be atomic. > Maybe there are some new guarantees on 64-bit writes. In any case, we > would still have to be compatible with existing Pentium III hardware and > probably have to go with some sort of critical section approach.
Yes this is true. I hoped that someone would point out exactly if there are any 64-bit atomic operations that work with doubles. It seems like there aren't because the patch by Ivan in HARMONY-2092 has comments that it is enough to change GC and class loader to align objects on 64-bits boundary and that's enough for 64-bit load/stores but only with memory fence instructions in interpreter in addition. > > or SSE4 on the processors that have it? > > Good point. I recall old versions were really only focused on multimedia. > And writing multimedia bits to memory is not sensitive to order or > atomicity. In other words, if you are writing to a frame buffer, speed of > writes is important but the order the bits hit the buffer is not. Again, I > looked but could not find the latest info SSE4 and atomicity. Actually it should have been SSE2. I pressed a wrong digit. I just meant quad load/stores when I wanted to mention it. > > Some observations: > > > 1) > > > Fixing the "volatile long" bug (Harmony-2092) by using critical section > > > > as > > > > > above should, as a side-effect, allow DekkerTest.java to run. > > > 2) > > > Using volatile long sort of, kind of defeats a major reason to use > > > > Dekker > > > > > algorithm in the first place. Why bother if the performance is the > > > same > > > > as > > > > > using critical sections? > > > 3) > > > Using "volatile int" in DekkerTest.java probably still fails because > > > > reads > > > > > can pass writes. One way to fix this might be to make the JIT emit r/w > > > memory fence whenever reading/writing the volatile int. While memory > > > fences > > > are often cheaper than HW locks, they are not free. > > > 4) > > > My guess is that there are no old legacy Java apps that use Dekker > > > algorithm. In other words, nobody is dependant on Dekker algorithm > > > working. My guess is that they are, however, dependent on volatile > > > long and > > > volatile int working properly. (which has the side effect of making > > > > Dekker > > > > > algo work.) > > > > > > On 2/21/07, Weldon Washburn <[EMAIL PROTECTED]> wrote: > > >> On 2/21/07, Gregory Shimansky <[EMAIL PROTECTED]> wrote: > > >> > On Wednesday 21 February 2007 21:47 Rana Dasgupta wrote: > > >> > > Weldon, > > >> > > But I am not sure why the behavior would be different from J9 on > > >> > > >> the > > >> > > >> > same > > >> > > > >> > > hardware. Do we jit volatiles differently? > > >> > > >> The differences in behavior can be caused by lots of things that are > > > > not > > > > >> related to memory model. For example the JIT might actually emit > > > > slighly > > > > >> different code. Slighly different code can easily open/close race > > >> conditions. The important concept is that both J9 and drlvm fail. > > >> And the > > >> failure appears to be because modern hardware is most likely not > > >> designed to > > >> run Dekker's algo without memory fences. > > >> > > >> There is a bug on DRLVM about volatile variables HARMONY-2092. It is > > >> about > > >> > > >> > long and double type variables assignments. Is it the same as in > > >> > Dekker's > > >> > algorithm? > > >> > > >> DekkerTest.java uses "long" variables. Yes, this could change the > > > > rate > > > > >> of failure but not eliminate failures completely. > > >> > > >> > On 2/20/07, Weldon Washburn <[EMAIL PROTECTED]> wrote: > > >> > > > It seems Dekker's algorithm is not expected to work on SPARC or > > >> > > >> IA32 > > >> > > >> > SMP > > >> > > > >> > > > boxes unless memory fences are used. DekkerTest.java in > > >> > > > >> > Harmony-2986 > > >> > > > >> > > > does not contain memory fences. The volatile keyword guarantees > > >> > > >> the > > >> > > >> > > > compiler will write a given variable to memory. However, the HW > > >> > > >> may > > >> > > >> > > > actually have a > > >> > > > write buffer and allow reads to pass writes. As far as I know, > > > > the > > > > >> > Java > > >> > > > >> > > > language does not provide a means to invoke a memory fence. > > >> > > > Thus > > >> > > > >> > there > > >> > > > >> > > > is no way to fix up DekkerTest.java. I may be misunderstanding > > >> > > > >> > something > > >> > > > >> > > > here. Does anyone have comment? > > >> > > > > > >> > > > An excellent description of the issues involved is in a David > > > > Dice > > > > >> > > > presentation at: > > >> > > > > > >> > > > http://blogs.sun.com/dave/resource/synchronization-public2.pdf > > >> > > > > > >> > > > -- > > >> > > > Weldon Washburn > > >> > > > Intel Enterprise Solutions Software Division > > >> > > > >> > -- > > >> > Gregory > > > > -- > > Gregory -- Gregory
