Re: Rewriting byte codes

2002-04-01 Thread Erik Corry


On Sun, Mar 31, 2002 at 08:41:34PM -0700, Patrick Tullmann wrote:
 Erik wrote:
  I'd like to make some changes to Kaffe to make it simpler to do more
  precise GC.
 
 Just for reference, so everyone's on the same page, Kaffe already does
 precise walking of Java objects (see gcFuncs.c:walkObject()).  It does
 not precisely walk stacks.  I believe it walks many native objects
 (like jthreads, etc) conservatively.
 
 The big problem with walking the stack isn't the Java stack as much as
 the native stack.  You could walk the Java parts precisely, and the
 native bits conservatively, but I don't know what you'd win anything
 by doing this.

OK, I'm not so familiar with the way Java interacts with
native code, but why do we need to walk the native bits at
all?  Surely C code doesn't need GC?

 As I understand GC trade-offs, the big win for precise GC is the
 ability to update pointers and thus implement a compacting collector.
 Is there something else you're hoping to get out of precise stack
 walking?

Predictability and speed of GC.

 Another approach to consider is to implement GC-safe points (e.g., on
 method calls and backwards branches in Java code).  Then you only have
 to track and update the stack maps at each safe point,

There's a lot to be said for this, but since you can allocate
unlimited memory in an exception handler, every point that can
throw an exception has to be a safe point, which reduces the
appeal.

-- 
Erik Corry [EMAIL PROTECTED]
Citér kun det nødvendige.  Slet denne signature.  Svar under det citerede.



Re: Rewriting byte codes

2002-04-01 Thread Erik Corry


On Mon, Apr 01, 2002 at 01:05:50PM +0200, Artur Biesiadowski wrote:
 
 Erik Corry wrote:
 
  There's a lot to be said for this, but since you can allocate
  unlimited memory in an exception handler, every point that can
  throw an exception has to be a safe point [...]
 
 If exception is thrown, you don't care about registers (unless you 
 write-cache locals in registers,

Good point, but you still have the problems with a thread that
has been suspended by the OS at a random point (if you use OS
threads, which you have to if you want to benefit from SMP).

 but it will give problems even without 
 precise gc),

Why?  Not that x86 has enough registers, but it might be interesting
for RISC.  I think Hotspot allocates registers freely to both stack
and local variables.

 You also need to mark which local variables are live and which are 
 dead - because if you will use dead object pointers as live, you are 
 non-precise and possibly unstable (as it can point inside some newly 
 allocated object).

As far as I can see it can only be _either_ non-precise _or_
unstable.  It could be nonprecise in the sense that you might
not free some memory as soon as you could have (but not in the
sense that you can't have a moving collector), but if you don't
collect you can't get pointers inside a new object.  Obviously
imprecision is vastly preferable to instability.

You can get around the imprecision by inserting zeroing ops
at strategic places.  Even if you omit this step the GC
reliability is already much better than what we have now.

 But if you have variable map with local liveness you 
 can as well add object/primitive distinction there and forget about 
 explicit separation in memory :)

If you disambiguate local variables that can be used for
both references and nonreferences your 'map' is reduced
to a single bitmap per method, and the lookup function the
GC uses can be correspondingly simple.  If you also sort the
locals the map reduces to a single index which is the boundary
between the references and the nonreferences.

I still think you need a full map for the JVM operand stack
and/or registers.  You can make them quite compact if you get
clever about it.  For example you can put a rudimentary
disassembler in so the program can trace what's going on
from the last 'safe point' to the current place.

 AFAIK, hotspot stops thread, replaces closest safe points with some trap 
 and let thread run until it hit one. Then it restores original 
 instruction and voila.

This sounds pretty ugly to me, since it involves lots of writing
to the instruction stream with corresponding I-cache flushes etc.
But it's doable and perhaps simpler than keeping stack maps
around for all points.

 Generally - it is hard. Not even stack maps and their representation, 
 but getting everything working with threads in efficient manner.

Yes, I can see that.

-- 
Erik Corry [EMAIL PROTECTED]



Re: Rewriting the GC

2002-04-01 Thread Erik Corry


On Mon, Apr 01, 2002 at 11:10:18AM -0700, Patrick Tullmann wrote:
 Erik wrote:
   The big problem with walking the stack isn't the Java stack as much as
   the native stack.  You could walk the Java parts precisely, and the
   native bits conservatively, but I don't know what you'd win anything
   by doing this.
  
  OK, I'm not so familiar with the way Java interacts with
  native code, but why do we need to walk the native bits at
  all?  Surely C code doesn't need GC?
 
 All the native methods in Kaffe, the threading system, the core class
 loading --- that's all written in C.  That code uses the same stack as
 the Java code.  That code uses Java object refereneces left and right,
 and stores them on the stack.

Ah, now I understand.

Well, I'm not going to do the work here, so perhaps I should just
shut up, but one way to do this is to have a (per-thread) global
called gc_chain that is available in every function.  Then when
you want to have gc-able structures on the stack you do something
like

struct thingie {
gcChain *next;
int signature;
here go the real fields
}

Then you use it with

{
struct thingie t;
t-next = gc_chain;
t-signature = THINGIESIGNATURE;
memzero(the rest of t);
gc_chain = (gcChain *)t;
Now do stuff with t
}

the GC has a chain of structures on the stack, each with a
signature that tells it what types are in it.  You can have
signature called 1reference, 2references, 3references etc.
to allow you to save single pointer (not structs) on the
stack too.

For this to work you can't allow GC at random points in the
code (since things could be in registers), but every so often
(ie back edges etc.) the C code has to call checkgc(gc_chain)
(the parameter will ensure that all registers are flushed to
memory) which will check whether the other threads are waiting
to do a GC.  The gc_chain should also be a parameter to the
allocation routine so we can always do a GC in there.

I guess we have to have the ability to walk forward to the
next safe point, and since that's the same as what we need
in Java code I guess that's the way to do it.

I haven't thought through all details of this idea, but it
seems to me it should work.

   As I understand GC trade-offs, the big win for precise GC is the
   ability to update pointers and thus implement a compacting collector.
   Is there something else you're hoping to get out of precise stack
   walking?
  
  Predictability and speed of GC.
 
 You're talking about bogus pointer references in a conservative scan
 being viewed as pointers, when in fact they're just integers that look
 like pointers?

Actually, I was just talking about compacting/generational/semispace
GCs, which require the ability to move objects, and which are all
faster than nonmoving GCs because they only treat a small part
of the heap on any given GC.

 nice ones) a system that supports safe points is (and I'm somewhat
 guessing here) much easier to write safe native code in --- you don't
 have to worry about your C code getting interrupted by the GC at
 arbitrary points, only at safe points you explicitly insert in your
 code.  There are downsides, of course --- like if you forget to put a
 safe point in somewhere, the GC can be blocked for a long time.

I agree now.

-- 
Erik Corry [EMAIL PROTECTED]



Re: Rewriting byte codes

2002-04-01 Thread Erik Corry


On Tue, Apr 02, 2002 at 02:48:36AM +0200, Artur Biesiadowski wrote:
 
 Erik Corry wrote:
 
 AFAIK, hotspot stops thread, replaces closest safe points with some trap 
 and let thread run until it hit one. Then it restores original 
 instruction and voila.
 
 
 This sounds pretty ugly to me, since it involves lots of writing
 to the instruction stream with corresponding I-cache flushes etc.
 
 I think that I-cache flush penalty is neglible compared to cost of 
 switching threads few times and performing gc. We are talking about 
 extra 20-50 cycles and gc will take about 1ms minimum (which gives about 
 100 cycles).

The real cost of an I-cache flush isn't the 20-50 cycles it
takes to flush the I-cache, it's the time it takes to refill
it again from L2.  But it's probably the right tactic anyhow.
The alternative is to put checks in at every safe point so
you can just set a global instead of having to rewrite code.

-- 
Erik Corry [EMAIL PROTECTED]



Rewriting byte codes

2002-03-31 Thread Erik Corry


Hi

I'd like to make some changes to Kaffe to make it simpler to
do more precise GC.  What would be needed at a minimum would
be:

* Split local variables that can be either reference or non-reference
  into two.  (The alternative is to keep track of when they are what).
* Keep track of which stack positions are references when we do a
  call or new.  Luckily the stack is zapped on an exception so we
  don't have trouble there.
* Make copies of jsr code so that each basic block can only be called
  from one jsr call point chain (default the null chain of course)

There's an obvious problem that when we split the variable, one
of the new variables could end up above the 256 byte level.  Similarly
we could end up with more than 65536 bytes in a method.
In order to fix this a little byte code rewriting is inevitable.  I
would suggest we rewrite the byte codes such that:

* All insns are wide (16 bit index to local variables), and we drop
  the wide prefix
* All inline constants are native-endian
* All pc indexes are 32 bits and of course native endian (mostly a
  problem with the exception table I think, also replace GOTO with
  GOTO_W throughout)

Copying jsr code is very nice because it means all problems related
to jsrs just disappear.  We can replace jsrs and rets with gotos.
You are not allowed to recurse with jsr (though you are allowed
nesting) so semantics should be preserved.

In theory it can give exponential code expansion, in practice I
doubt it does.  I don't think you can construct an example in
Java that gives exponential code expansion, and if you had byte
code that did that, it would take exponential time to verify, so
it's a bad idea anyway.

Any comments?  I'm not guaranteeing I'll have time, but right
now it looks as though I may need it for something else anyway.

-- 
Erik Corry



Re: Rewriting byte codes

2002-03-31 Thread Erik Corry


On Sun, Mar 31, 2002 at 03:40:48PM -0800, Jim Pick wrote:
 
 On Sun, Mar 31, 2002 at 09:02:46PM +0200, Erik Corry wrote:
  
  Hi
  
  I'd like to make some changes to Kaffe to make it simpler to
  do more precise GC.
 
 I'm all for getting some better GC capabilities into Kaffe.
 
  What would be needed at a minimum would
  be:
  
  * Split local variables that can be either reference or non-reference
into two.  (The alternative is to keep track of when they are what).
 
 What do the other VMs with non-conservative GC do?

Hotspot splits them, but any VM needs to keep track of the stack.
It would be nice if the compiler didn't reuse stack positions
for reference and non-reference values.  I don't know if any
compilers do that, but it would save the rewriting and cost
very little.

  * Keep track of which stack positions are references when we do a
call or new.  Luckily the stack is zapped on an exception so we
don't have trouble there.
 
 Where does the information about which stack positions are references
 get stored?  On the stack, I suppose?

That's one solution.  The other is to have a structure that
correlates byte code position with stack types.

If you cache the top few stack positions in registers then you
can reserve some registers for references and others for non-references.
Then you just need to tag the part of the stack (if any) that you spill
out to memory.  This isn't really going to fly in its simplest form on
machines with fewer than 32 registers, and for the interpreter it means
gcc only, since no other compilers let you control register usage so
closely (trouble with exceptions).

 I need to look at some other VM implementations to see how they keep
 track of references on the stack (not to mention objects on the heap).
 
  * Make copies of jsr code so that each basic block can only be called
from one jsr call point chain (default the null chain of course)
 
 I think I understand the idea behind splitting the local variables
 (since the local variable used to store the return PC from a jsr call
 could also be used to store a reference).

This is a different optimisation, the main point of which is
to make verification about 10 times easier.  See
http://citeseer.nj.nec.com/freund98costs.html

  There's an obvious problem that when we split the variable, one
  of the new variables could end up above the 256 byte level.  Similarly
  we could end up with more than 65536 bytes in a method.
  In order to fix this a little byte code rewriting is inevitable.  I
  would suggest we rewrite the byte codes such that:
  
  * All insns are wide (16 bit index to local variables), and we drop
the wide prefix
  * All inline constants are native-endian
  * All pc indexes are 32 bits and of course native endian (mostly a
problem with the exception table I think, also replace GOTO with
GOTO_W throughout)
 
 The ideas sound nice, although I we should think seriously about what
 the tradeoffs are before making this the default.  The rewriting phase
 is some additional overhead, and there is some code expansion (but
 maybe the resulting code will be faster).

Yes, perhaps we can avoid the rewriting in the common case, ie
no jsrs and not more than 256 local variables even after splitting.

 I think it would be nice to have a generic spot where we can rewrite
 bytecode before JITing it.  I imagine that could be used for some
 other optimizations too.

The native byte ordering is a win under any circumstances I think.

 Theoretically, somebody could hand code a subroutine that did
 something different with the jsr return PC value that is pushed onto
 the stack (instead of using an astore, as is typically found in a
 finally clause).  In real life, that probably never happens, but isn't
 it still legal bytecode?

I don't think it is, actually.

The only slightly tricky thing you can do with them is to return
several levels in one by returning with the previous return value
instead of the most recent one.  So we would need to keep track of
that.  But you can't recurse, you can't have more than one return
point and you of course can't construct a synthetic return value.
We need to check those things anyway in the byte code verifier.
 
-- 
Erik Corry [EMAIL PROTECTED]



Re: Interesting results...

2002-03-21 Thread Erik Corry


On Thu, Mar 21, 2002 at 05:17:13PM +1100, Andrew Dalgleish wrote:
 
 On Thu, Mar 21, 2002 at 04:42:13AM +0100, Dalibor Topic wrote:
 Try to rebuild Klasses.jar using jikes 1.13
 
 jikes == 1.13 or jikes = 1.13?

jikes == 1.13.

-- 
EC



Re: Interesting results...

2002-03-20 Thread Erik Corry


On Thu, Mar 21, 2002 at 01:28:07AM -0500, Mason Loring Bliss wrote:
 
 I've built Jikes 1.13 and set JIKES = jikes in rebuildLib and Makefile,
 and this is what I see:
 
 acheron /usr/local/src/kaffe/libraries/javalib# make classes
 /bin/sh ./rebuildLib
 Compiling classes ...
 
 Issued 1 semantic warning compiling java/awt/Toolkit.java:
 
504. native static synchronized void imgProduceImage( ImageNativeProducer prod, 
Ptr imgData);
  -
 *** Warning: The type ImageNativeProducer is defined in the file Image.java but 
referenced in the file java/awt/Toolkit.java. It is recommended that it be 
redefined in ImageNativeProducer.java.

It's only a warning.

 I'll try a newer Jikes tomorrow, I guess. (Mm, cvs update -r foo.)

As far as I can see Jikes 1.13 is the one that works.  The
alternative is to actually find the bug, probably somewhere
in the verifier.  I tried running with ElectriFence and it
made no difference, so I don't think it's a malloc bug.

-- 
Erik Corry [EMAIL PROTECTED]



Re: VerifyError in PushbackReader

2002-03-17 Thread Erik Corry


On Sat, Mar 16, 2002 at 05:27:05PM -0800, Archie Cobbs wrote:
 
 Erik Corry writes:
  Whenever I try to run a program (javac, HelloWorld, appletviewer)
  I get the same error:
  
  java.lang.VerifyError: at pc 5 sp 7 not in range [4, 6]
  at java.io.PushbackReader.init(PushbackReader.java:32)
  at at.dms.compiler.tools.antlr.extra.InputBuffer.init(InputBuffer.java:68)
  at at.dms.compiler.tools.antlr.extra.InputBuffer.init(InputBuffer.java:81)
  at at.dms.kjc.Main.parseFile(Main.java:278)
  at at.dms.kjc.Main.run(Main.java:116)
  at at.dms.kjc.Main.compile(Main.java:68)
  at at.dms.kjc.Main.main(Main.java:59)
 
 I'm pretty sure it's a verifier problem. I've seen this before,
 and when I sent the classfile to the jikes people, they said it
 looked fine to them.
 
 You might try rebuilding Klasses.jar with jikes or kjc (whichever
 wasn't used before) as a possible workaround..

Anyone know what was used before?  I have jikes 1.13 installed.
How do I rebuild Klasses.jar?

If it's a compiler problem, then the error message is very misleading.
I just compiled with -DDEBUG, and the error message went away, but
now all programs terminate 'normally' without doing anything.  Here's
the last part of the output of HelloWorld, running with

  export KAFFE_VMDEBUG=JARFILES,ELOOKUP,CLASSFILE

http://www.arbat.com/erik/error.txt 

-- 
Erik Corry [EMAIL PROTECTED]
Citér kun det nødvendige.  Slet denne signature.  Svar under det citerede.



Re: VerifyError in PushbackReader

2002-03-17 Thread Erik Corry


On Sun, Mar 17, 2002 at 02:17:46PM +, Chris Gray wrote:
  
  With earlier versions of jikes, I would get verify errors like this
  whenever there was a private or protected constructor (rather than
  public) on an internal class. PushbackReader has that on the
  PushbackBuffer class, so that might be worth checking out.
  
  With jikes 1.14 this seems to have been fixed, but jikes 1.15 is
  completely broken as far as I know and produces verify errors all over
  the place.
 
 Yes, jikes 1.15 produces bad code: this is not specific to kaffe.
 (Jikes 1.15b is said to be better, but we haven't tried it out yet).

OK, I recompiled with jikes-1.13 (jikes-1.13-1 from RedHat 7.1) and
now I don't have the problem.  The way the problem came and went
depending on the time of day, and whether I stopped the program in
the debugger makes me think that it was caused by uninitialised
data on the stack.  The verifier files are filled with variables
that are not initialised when they are declared, though none of them
are obvious problems.

At any rate I have a workaround now, so I am happy.  I didn't try
with jikes-1.14, because I already deinstalled it.  It crashes
on many of the files in GNU ClassPath (actually the Sable version),
so I didn't trust it.

If someone recompiles with 1.15b then I can try the resulting .jar
file to see if it works for me.  It would be nice to have a version
of Klasses.jar in CVS that doesn't show the problem.

-- 
Erik Corry [EMAIL PROTECTED]



VerifyError in PushbackReader

2002-03-16 Thread Erik Corry


Hi

I have the newest (cvs) version of Kaffe checked out of CVS
on a Red Hat 7.2-like system.  I compiled with

CC=gcc3 CCC=g++3 CFLAGS=-g ./configure

(the normal Red Hat 2.96 version of gcc had trouble with the
libgcj header files).  I'm running into a bug which is also
entered in the bug database (in incoming), but which I can't
contribute to as I can't find how to log in as anyone other
than guest.

Whenever I try to run a program (javac, HelloWorld, appletviewer)
I get the same error:

java.lang.VerifyError: at pc 5 sp 7 not in range [4, 6]
at java.io.PushbackReader.init(PushbackReader.java:32)
at at.dms.compiler.tools.antlr.extra.InputBuffer.init(InputBuffer.java:68)
at at.dms.compiler.tools.antlr.extra.InputBuffer.init(InputBuffer.java:81)
at at.dms.kjc.Main.parseFile(Main.java:278)
at at.dms.kjc.Main.run(Main.java:116)
at at.dms.kjc.Main.compile(Main.java:68)
at at.dms.kjc.Main.main(Main.java:59)

I haven't been able to disassemble PushbackReader to find out
whether it is the class (jar) file that is the problem or
the verifier, but I guess it is the verifier.  When I run
kaffe under gdb the problem seems to jump around, sometimes
happening at the 1805th call of verifyBasicBlock, sometimes
later.  I haven't been able to pin it down.  This makes me
think it has to do with uninitialised data on the stack.

When I try to run javap on another system in order to take
a look at the PushbackReader in the Klasses.jar file, I
think it just disassembles the PushbackReader in its own
java system - it's hard to tell, since it doesn't say
where it is getting the info from.  Also, I'm not sure
what the init means.

Any ideas?

-- 
Erik Corry [EMAIL PROTECTED]