Re: SMP problems (was Re: GNU Classpath 0.12, ..., 1.0)

Robert Lougher Sun, 14 Nov 2004 18:59:39 -0800

Hi,

Further to my previous email, here's a patch to JamVM 1.2.0 to include
memory barriers on Intel.  Mark, as you've seen the problem with
Eclipse 3, could you give it a test?  If there's anybody else who's
seen problems I'd be grateful if you could also give it a go.


Thanks,

Rob.

P.S.  I believe the compare_and_swap implementation on Intel was
correct, as any locked instruction forms a memory barrier.  However,
there are a couple of other places where ordering is important -- I've
added memory barriers here.  In particular, bytecode rewriting in the
interpreter.  I suspect this is the most likely cause of the problem,
as Mark said making static methods synchronised slows it down (i.e.
only 1 thread can be in the method).  The memory barrier itself is a
locked no-op; the sfence, lfence and mfence instructions exist on the
P4 but will not work on all processors.

On Sat, 13 Nov 2004 12:07:37 -0500, Chris Pickett
<[EMAIL PROTECTED]> wrote:
> Robert Lougher wrote:
> > Hi all,
> >
> > On Sat, 13 Nov 2004 11:58:53 +0100, Mark Wielaard <[EMAIL PROTECTED]> wrote:
> >
> >>The Eclipse 3 (but not 2) startup problem seems to only happen on SMP
> >>machine (it disappears when I don't use a SMP kernel, this is on a Intel
> >>hyperthreading system) with jamvm [*]. It works fine with gcj/gij (it
> >>doesn't work anymore with kaffe though since they don't implement
> >>java.lang.ClassLoader.setSigners which we now call).
> >
> >
> > I'm not terribly surprised -- I've never tested JamVM on a real or
> > virtual SMP machine before.  When writing the thin-locking
> > implementation I didn't include any SMP memory barriers, so it's
> > something I've been expecting to hear!  I'll look at including them
> > for the next release.  Mark, would you be willing to do the testing?
> >
> >
> >>Cheers,
> >>
> >>Mark
> >>
> >>[*] Hint for Robert. When inspecting with -verbose I can see that some
> >>classes are [loaded] multiple times. I can slow down crashing a bit by
> >>making various VMClass static methods synchronized, but that is not a
> >>full solution. I think this is a bug in the runtime that needs to guard
> >>against defining the same class from multiple threads and not completely
> >>fixable in our core libraries setup.
> >>
> >
> >
> > I don't think this is the cause.  This can happen even on a
> > uni-processor machine.  Two threads can see a class hasn't been loaded
> > and start to define it.  However, the updating of the loaded class
> > hash table is locked.  One thread will win the race and update the
> > table, the other will find it already there, and discard the one it's
> > just loaded.  This keeps locking to a minimum, and should lead to
> > overall faster behaviour.  It's a bug as to where the -verbose message
> > is printed -- it should only be done by the thread that wins the race.
> 
> For what it's worth, we've had SMP problems in SableVM for a while now
> also.  They too seem related to thread startup and thread death.  It
> never occurred to me that this might be a Classpath problem since until
> now I thought we were the only ones, but then again it could just be
> that both JamVM and SableVM have equally bad internal locking :(.  I
> tried putting in memory barriers as prescribed by the JSR133 cookbook
> [1], but it didn't make any difference.  In fact, I tried putting a
> StoreLoad barrier in between every single bytecode instruction, and it
> still didn't help.  I haven't tested Eclipse, but will try to (or some
> other SableVM person with a working Eclipse installation could try).
> 
> [1] http://gee.cs.oswego.edu/dl/jmm/cookbook.html
> 
> SableVM also doesn't have any handling of Java volatiles, which do
> indeed exist in the Classpath threading code.  However, one would think
> that with a barrier in between every single bytecode that this wouldn't
> matter and that something else must be wrong.  We did manage to squash a
> couple of threading bugs when somebody tried to build on NetBSD (I
> think...), and got compile-time pthread initialization warnings.
> 
> Again my experience says that this isn't strictly limited to SMP
> machines, but that on UP's the time between context switches is so long
> that it's much harder to catch these heisenbugs.
> 
> I think it would be interesting to hear from VM developers who _don't_
> have problems on SMP machines but had them in the past and somehow
> managed to eliminate them.
> 
> Chris
>

mb-patch
Description: Binary data

_______________________________________________
Classpath mailing list
[EMAIL PROTECTED]
http://lists.gnu.org/mailman/listinfo/classpath

Re: SMP problems (was Re: GNU Classpath 0.12, ..., 1.0)

Reply via email to