Re: Rewriting byte codes
On Sun, Mar 31, 2002 at 08:41:34PM -0700, Patrick Tullmann wrote: Erik wrote: I'd like to make some changes to Kaffe to make it simpler to do more precise GC. Just for reference, so everyone's on the same page, Kaffe already does precise walking of Java objects (see gcFuncs.c:walkObject()). It does not precisely walk stacks. I believe it walks many native objects (like jthreads, etc) conservatively. The big problem with walking the stack isn't the Java stack as much as the native stack. You could walk the Java parts precisely, and the native bits conservatively, but I don't know what you'd win anything by doing this. OK, I'm not so familiar with the way Java interacts with native code, but why do we need to walk the native bits at all? Surely C code doesn't need GC? As I understand GC trade-offs, the big win for precise GC is the ability to update pointers and thus implement a compacting collector. Is there something else you're hoping to get out of precise stack walking? Predictability and speed of GC. Another approach to consider is to implement GC-safe points (e.g., on method calls and backwards branches in Java code). Then you only have to track and update the stack maps at each safe point, There's a lot to be said for this, but since you can allocate unlimited memory in an exception handler, every point that can throw an exception has to be a safe point, which reduces the appeal. -- Erik Corry [EMAIL PROTECTED] Citér kun det nødvendige. Slet denne signature. Svar under det citerede.
Re: Rewriting byte codes
On Mon, Apr 01, 2002 at 01:05:50PM +0200, Artur Biesiadowski wrote: Erik Corry wrote: There's a lot to be said for this, but since you can allocate unlimited memory in an exception handler, every point that can throw an exception has to be a safe point [...] If exception is thrown, you don't care about registers (unless you write-cache locals in registers, Good point, but you still have the problems with a thread that has been suspended by the OS at a random point (if you use OS threads, which you have to if you want to benefit from SMP). but it will give problems even without precise gc), Why? Not that x86 has enough registers, but it might be interesting for RISC. I think Hotspot allocates registers freely to both stack and local variables. You also need to mark which local variables are live and which are dead - because if you will use dead object pointers as live, you are non-precise and possibly unstable (as it can point inside some newly allocated object). As far as I can see it can only be _either_ non-precise _or_ unstable. It could be nonprecise in the sense that you might not free some memory as soon as you could have (but not in the sense that you can't have a moving collector), but if you don't collect you can't get pointers inside a new object. Obviously imprecision is vastly preferable to instability. You can get around the imprecision by inserting zeroing ops at strategic places. Even if you omit this step the GC reliability is already much better than what we have now. But if you have variable map with local liveness you can as well add object/primitive distinction there and forget about explicit separation in memory :) If you disambiguate local variables that can be used for both references and nonreferences your 'map' is reduced to a single bitmap per method, and the lookup function the GC uses can be correspondingly simple. If you also sort the locals the map reduces to a single index which is the boundary between the references and the nonreferences. I still think you need a full map for the JVM operand stack and/or registers. You can make them quite compact if you get clever about it. For example you can put a rudimentary disassembler in so the program can trace what's going on from the last 'safe point' to the current place. AFAIK, hotspot stops thread, replaces closest safe points with some trap and let thread run until it hit one. Then it restores original instruction and voila. This sounds pretty ugly to me, since it involves lots of writing to the instruction stream with corresponding I-cache flushes etc. But it's doable and perhaps simpler than keeping stack maps around for all points. Generally - it is hard. Not even stack maps and their representation, but getting everything working with threads in efficient manner. Yes, I can see that. -- Erik Corry [EMAIL PROTECTED]
Re: Rewriting the GC
On Mon, Apr 01, 2002 at 11:10:18AM -0700, Patrick Tullmann wrote: Erik wrote: The big problem with walking the stack isn't the Java stack as much as the native stack. You could walk the Java parts precisely, and the native bits conservatively, but I don't know what you'd win anything by doing this. OK, I'm not so familiar with the way Java interacts with native code, but why do we need to walk the native bits at all? Surely C code doesn't need GC? All the native methods in Kaffe, the threading system, the core class loading --- that's all written in C. That code uses the same stack as the Java code. That code uses Java object refereneces left and right, and stores them on the stack. Ah, now I understand. Well, I'm not going to do the work here, so perhaps I should just shut up, but one way to do this is to have a (per-thread) global called gc_chain that is available in every function. Then when you want to have gc-able structures on the stack you do something like struct thingie { gcChain *next; int signature; here go the real fields } Then you use it with { struct thingie t; t-next = gc_chain; t-signature = THINGIESIGNATURE; memzero(the rest of t); gc_chain = (gcChain *)t; Now do stuff with t } the GC has a chain of structures on the stack, each with a signature that tells it what types are in it. You can have signature called 1reference, 2references, 3references etc. to allow you to save single pointer (not structs) on the stack too. For this to work you can't allow GC at random points in the code (since things could be in registers), but every so often (ie back edges etc.) the C code has to call checkgc(gc_chain) (the parameter will ensure that all registers are flushed to memory) which will check whether the other threads are waiting to do a GC. The gc_chain should also be a parameter to the allocation routine so we can always do a GC in there. I guess we have to have the ability to walk forward to the next safe point, and since that's the same as what we need in Java code I guess that's the way to do it. I haven't thought through all details of this idea, but it seems to me it should work. As I understand GC trade-offs, the big win for precise GC is the ability to update pointers and thus implement a compacting collector. Is there something else you're hoping to get out of precise stack walking? Predictability and speed of GC. You're talking about bogus pointer references in a conservative scan being viewed as pointers, when in fact they're just integers that look like pointers? Actually, I was just talking about compacting/generational/semispace GCs, which require the ability to move objects, and which are all faster than nonmoving GCs because they only treat a small part of the heap on any given GC. nice ones) a system that supports safe points is (and I'm somewhat guessing here) much easier to write safe native code in --- you don't have to worry about your C code getting interrupted by the GC at arbitrary points, only at safe points you explicitly insert in your code. There are downsides, of course --- like if you forget to put a safe point in somewhere, the GC can be blocked for a long time. I agree now. -- Erik Corry [EMAIL PROTECTED]
Re: Rewriting byte codes
On Tue, Apr 02, 2002 at 02:48:36AM +0200, Artur Biesiadowski wrote: Erik Corry wrote: AFAIK, hotspot stops thread, replaces closest safe points with some trap and let thread run until it hit one. Then it restores original instruction and voila. This sounds pretty ugly to me, since it involves lots of writing to the instruction stream with corresponding I-cache flushes etc. I think that I-cache flush penalty is neglible compared to cost of switching threads few times and performing gc. We are talking about extra 20-50 cycles and gc will take about 1ms minimum (which gives about 100 cycles). The real cost of an I-cache flush isn't the 20-50 cycles it takes to flush the I-cache, it's the time it takes to refill it again from L2. But it's probably the right tactic anyhow. The alternative is to put checks in at every safe point so you can just set a global instead of having to rewrite code. -- Erik Corry [EMAIL PROTECTED]
Rewriting byte codes
Hi I'd like to make some changes to Kaffe to make it simpler to do more precise GC. What would be needed at a minimum would be: * Split local variables that can be either reference or non-reference into two. (The alternative is to keep track of when they are what). * Keep track of which stack positions are references when we do a call or new. Luckily the stack is zapped on an exception so we don't have trouble there. * Make copies of jsr code so that each basic block can only be called from one jsr call point chain (default the null chain of course) There's an obvious problem that when we split the variable, one of the new variables could end up above the 256 byte level. Similarly we could end up with more than 65536 bytes in a method. In order to fix this a little byte code rewriting is inevitable. I would suggest we rewrite the byte codes such that: * All insns are wide (16 bit index to local variables), and we drop the wide prefix * All inline constants are native-endian * All pc indexes are 32 bits and of course native endian (mostly a problem with the exception table I think, also replace GOTO with GOTO_W throughout) Copying jsr code is very nice because it means all problems related to jsrs just disappear. We can replace jsrs and rets with gotos. You are not allowed to recurse with jsr (though you are allowed nesting) so semantics should be preserved. In theory it can give exponential code expansion, in practice I doubt it does. I don't think you can construct an example in Java that gives exponential code expansion, and if you had byte code that did that, it would take exponential time to verify, so it's a bad idea anyway. Any comments? I'm not guaranteeing I'll have time, but right now it looks as though I may need it for something else anyway. -- Erik Corry
Re: Rewriting byte codes
On Sun, Mar 31, 2002 at 03:40:48PM -0800, Jim Pick wrote: On Sun, Mar 31, 2002 at 09:02:46PM +0200, Erik Corry wrote: Hi I'd like to make some changes to Kaffe to make it simpler to do more precise GC. I'm all for getting some better GC capabilities into Kaffe. What would be needed at a minimum would be: * Split local variables that can be either reference or non-reference into two. (The alternative is to keep track of when they are what). What do the other VMs with non-conservative GC do? Hotspot splits them, but any VM needs to keep track of the stack. It would be nice if the compiler didn't reuse stack positions for reference and non-reference values. I don't know if any compilers do that, but it would save the rewriting and cost very little. * Keep track of which stack positions are references when we do a call or new. Luckily the stack is zapped on an exception so we don't have trouble there. Where does the information about which stack positions are references get stored? On the stack, I suppose? That's one solution. The other is to have a structure that correlates byte code position with stack types. If you cache the top few stack positions in registers then you can reserve some registers for references and others for non-references. Then you just need to tag the part of the stack (if any) that you spill out to memory. This isn't really going to fly in its simplest form on machines with fewer than 32 registers, and for the interpreter it means gcc only, since no other compilers let you control register usage so closely (trouble with exceptions). I need to look at some other VM implementations to see how they keep track of references on the stack (not to mention objects on the heap). * Make copies of jsr code so that each basic block can only be called from one jsr call point chain (default the null chain of course) I think I understand the idea behind splitting the local variables (since the local variable used to store the return PC from a jsr call could also be used to store a reference). This is a different optimisation, the main point of which is to make verification about 10 times easier. See http://citeseer.nj.nec.com/freund98costs.html There's an obvious problem that when we split the variable, one of the new variables could end up above the 256 byte level. Similarly we could end up with more than 65536 bytes in a method. In order to fix this a little byte code rewriting is inevitable. I would suggest we rewrite the byte codes such that: * All insns are wide (16 bit index to local variables), and we drop the wide prefix * All inline constants are native-endian * All pc indexes are 32 bits and of course native endian (mostly a problem with the exception table I think, also replace GOTO with GOTO_W throughout) The ideas sound nice, although I we should think seriously about what the tradeoffs are before making this the default. The rewriting phase is some additional overhead, and there is some code expansion (but maybe the resulting code will be faster). Yes, perhaps we can avoid the rewriting in the common case, ie no jsrs and not more than 256 local variables even after splitting. I think it would be nice to have a generic spot where we can rewrite bytecode before JITing it. I imagine that could be used for some other optimizations too. The native byte ordering is a win under any circumstances I think. Theoretically, somebody could hand code a subroutine that did something different with the jsr return PC value that is pushed onto the stack (instead of using an astore, as is typically found in a finally clause). In real life, that probably never happens, but isn't it still legal bytecode? I don't think it is, actually. The only slightly tricky thing you can do with them is to return several levels in one by returning with the previous return value instead of the most recent one. So we would need to keep track of that. But you can't recurse, you can't have more than one return point and you of course can't construct a synthetic return value. We need to check those things anyway in the byte code verifier. -- Erik Corry [EMAIL PROTECTED]
Re: Interesting results...
On Thu, Mar 21, 2002 at 05:17:13PM +1100, Andrew Dalgleish wrote: On Thu, Mar 21, 2002 at 04:42:13AM +0100, Dalibor Topic wrote: Try to rebuild Klasses.jar using jikes 1.13 jikes == 1.13 or jikes = 1.13? jikes == 1.13. -- EC
Re: Interesting results...
On Thu, Mar 21, 2002 at 01:28:07AM -0500, Mason Loring Bliss wrote: I've built Jikes 1.13 and set JIKES = jikes in rebuildLib and Makefile, and this is what I see: acheron /usr/local/src/kaffe/libraries/javalib# make classes /bin/sh ./rebuildLib Compiling classes ... Issued 1 semantic warning compiling java/awt/Toolkit.java: 504. native static synchronized void imgProduceImage( ImageNativeProducer prod, Ptr imgData); - *** Warning: The type ImageNativeProducer is defined in the file Image.java but referenced in the file java/awt/Toolkit.java. It is recommended that it be redefined in ImageNativeProducer.java. It's only a warning. I'll try a newer Jikes tomorrow, I guess. (Mm, cvs update -r foo.) As far as I can see Jikes 1.13 is the one that works. The alternative is to actually find the bug, probably somewhere in the verifier. I tried running with ElectriFence and it made no difference, so I don't think it's a malloc bug. -- Erik Corry [EMAIL PROTECTED]
Re: VerifyError in PushbackReader
On Sat, Mar 16, 2002 at 05:27:05PM -0800, Archie Cobbs wrote: Erik Corry writes: Whenever I try to run a program (javac, HelloWorld, appletviewer) I get the same error: java.lang.VerifyError: at pc 5 sp 7 not in range [4, 6] at java.io.PushbackReader.init(PushbackReader.java:32) at at.dms.compiler.tools.antlr.extra.InputBuffer.init(InputBuffer.java:68) at at.dms.compiler.tools.antlr.extra.InputBuffer.init(InputBuffer.java:81) at at.dms.kjc.Main.parseFile(Main.java:278) at at.dms.kjc.Main.run(Main.java:116) at at.dms.kjc.Main.compile(Main.java:68) at at.dms.kjc.Main.main(Main.java:59) I'm pretty sure it's a verifier problem. I've seen this before, and when I sent the classfile to the jikes people, they said it looked fine to them. You might try rebuilding Klasses.jar with jikes or kjc (whichever wasn't used before) as a possible workaround.. Anyone know what was used before? I have jikes 1.13 installed. How do I rebuild Klasses.jar? If it's a compiler problem, then the error message is very misleading. I just compiled with -DDEBUG, and the error message went away, but now all programs terminate 'normally' without doing anything. Here's the last part of the output of HelloWorld, running with export KAFFE_VMDEBUG=JARFILES,ELOOKUP,CLASSFILE http://www.arbat.com/erik/error.txt -- Erik Corry [EMAIL PROTECTED] Citér kun det nødvendige. Slet denne signature. Svar under det citerede.
Re: VerifyError in PushbackReader
On Sun, Mar 17, 2002 at 02:17:46PM +, Chris Gray wrote: With earlier versions of jikes, I would get verify errors like this whenever there was a private or protected constructor (rather than public) on an internal class. PushbackReader has that on the PushbackBuffer class, so that might be worth checking out. With jikes 1.14 this seems to have been fixed, but jikes 1.15 is completely broken as far as I know and produces verify errors all over the place. Yes, jikes 1.15 produces bad code: this is not specific to kaffe. (Jikes 1.15b is said to be better, but we haven't tried it out yet). OK, I recompiled with jikes-1.13 (jikes-1.13-1 from RedHat 7.1) and now I don't have the problem. The way the problem came and went depending on the time of day, and whether I stopped the program in the debugger makes me think that it was caused by uninitialised data on the stack. The verifier files are filled with variables that are not initialised when they are declared, though none of them are obvious problems. At any rate I have a workaround now, so I am happy. I didn't try with jikes-1.14, because I already deinstalled it. It crashes on many of the files in GNU ClassPath (actually the Sable version), so I didn't trust it. If someone recompiles with 1.15b then I can try the resulting .jar file to see if it works for me. It would be nice to have a version of Klasses.jar in CVS that doesn't show the problem. -- Erik Corry [EMAIL PROTECTED]
VerifyError in PushbackReader
Hi I have the newest (cvs) version of Kaffe checked out of CVS on a Red Hat 7.2-like system. I compiled with CC=gcc3 CCC=g++3 CFLAGS=-g ./configure (the normal Red Hat 2.96 version of gcc had trouble with the libgcj header files). I'm running into a bug which is also entered in the bug database (in incoming), but which I can't contribute to as I can't find how to log in as anyone other than guest. Whenever I try to run a program (javac, HelloWorld, appletviewer) I get the same error: java.lang.VerifyError: at pc 5 sp 7 not in range [4, 6] at java.io.PushbackReader.init(PushbackReader.java:32) at at.dms.compiler.tools.antlr.extra.InputBuffer.init(InputBuffer.java:68) at at.dms.compiler.tools.antlr.extra.InputBuffer.init(InputBuffer.java:81) at at.dms.kjc.Main.parseFile(Main.java:278) at at.dms.kjc.Main.run(Main.java:116) at at.dms.kjc.Main.compile(Main.java:68) at at.dms.kjc.Main.main(Main.java:59) I haven't been able to disassemble PushbackReader to find out whether it is the class (jar) file that is the problem or the verifier, but I guess it is the verifier. When I run kaffe under gdb the problem seems to jump around, sometimes happening at the 1805th call of verifyBasicBlock, sometimes later. I haven't been able to pin it down. This makes me think it has to do with uninitialised data on the stack. When I try to run javap on another system in order to take a look at the PushbackReader in the Klasses.jar file, I think it just disassembles the PushbackReader in its own java system - it's hard to tell, since it doesn't say where it is getting the info from. Also, I'm not sure what the init means. Any ideas? -- Erik Corry [EMAIL PROTECTED]