<long post on improvements in Derby's code generation follows> I working on DERBY-176/DERBY-766 to increase the size of queries that Derby can handle, and yes the examples in the largeCodeGen test are based upon real customer examples.
http://issues.apache.org/jira/browse/DERBY-766 http://issues.apache.org/jira/browse/DERBY-176 There is a dump of my thoughts on my current direction. Code generation on large queries is running into the per-method code limit of 64k from the JVM spec. Currently Derby already overflows into multiple methods in two ways: 1) at the caller level, ie. the SQL compiler. There are a couple of approaches to this, one is counting statements and switch to a new method at 2,000 statements, the other is always generate sections in a sub-method and call it from the primary method. The trouble here is that it's very much reactive, we see a failure to generate code when a query has lots of X and so we only adjust the code that generates X. 2) at Derby's byte code compiler level. This is the best as the caller is unaware of it. The current approach is to start looking for places to split the method when it's size is approaching the limit of 64k. So after around 50k bytes of code the compiler will try to split the method if it can. This leads to a potential calling sequence like // method returned to caller (ie. sql compiler) public Object e0() { // ~50k of code return e0_1(); } private Object e0_1() { // ~50k of code return e0_2(); } private Object e0_2() { // remaining code // note the stack will be N frames deeper // depending on the number of sub-methods we // had to generate, than if we had a single large method. return result; } The code starts looking at around 50k because in its current form there are limited places the split can occur, and since it may take a large number of instructions to get to a split point, the code is conservative. One criteria is that the method's current stack depth must be zero in order to split, this simplifies a lot, no need to pass arguments to the sub methods, and concerns about future instructions requirement on values on the stack. Another current criteria is that the method has no parameters, this is actually an easy one to fix, but it doesn't benefit Derby much since the generated methods that take parameters tend not to have the stack depth dropping to zero, so they can't be split. One other criteria is the split point can not be in the middle of a conditional block, that just solves a host of tricky issues. If you mapped the methods that can be split currently to Java language code then they would look like a list of independent statements. E.g. public void bigMethod() { a(); b(3); c(d()); f = e(); ... } One can see that if you were coding in the Java language and the method got to big this would be easy to split, but you wouldn't do it like the nested approach above, instead you would do this for the above e0 example. public Object e0() { // ~64k of code e0_1(); return e0_2(); } private void e0_1() { // ~64k of code } private Object e0_2() { // remaining code // note the stack will be only one frame deeper // than if we had a single large method. return result; } This is better for two reasons: - The stack is only ever one frame deeper than a single big method. With the current nested approach it is N frames deeper. - There is no need to split early, you can use each method to its maximum capacity, leading to potentially fewer methods, e.g. 7 sub- methods instead of 8. The other type of method that Derby generates and currently can not be split would look like this in the Java language. public Object e0() { return a(b( c(d(),e(),f()) ,e() ,k() ) ,o(), j(),e() ); } That is, deeply nested method calls, hence the stack depth never dropping to zero. These method calls come from the creation of internal language result sets to execute the query. These result sets are created through factory methods, some of which have up to 26 arguments. And some of those arguments will be deep nested calls themselves. So with a select with 5000 UNIONS there will be probably tens of thousands of result sets generated with maybe one hundred thousand parameters in total. If you were going to split that in java language, you would take groups of method calls and have those in a separate method, something like. public Object e0() { return a(e0_1(), o(), j(),e()); } private Object e0_1() { return b(e0_2() ,e() ,k()); } private Object e0_2() { return c(d(),e(),f()); } So, I'm trying to generalize splitting a method into multiple methods to be contained within the compiler and eventually remove the external places where method splitting is happening. Based on my current thinking I'm going to try an take a "post compilation" approach. The current internal approach occurs as the byte code for the method is being built up. But after thinking about this for a long time I believe the post method completion is the correct approach. This has the advantages listed above, reduced stack frames and fewer methods. In some cases we would have a single method where today we split around the 50k limit. One key requirement is for the ability to understand the structure of already generated byte code. The current compiler just builds the instructions as it goes along, there's no code to be able to walk the instruction stream once it is generated. This is, understanding how many bytes make up each instruction so that one can move through the underlying byte array in the correct sequence. The next requirement is to be able to calculate the maximum stack depth for any arbitrary set of byte code so that it can be pushed into a sub-method. Today the maximum stack depth is calculated as the instruction stream is built, and as currently we split methods at the current instruction and then continue building in the new method the max stack calculation for the new method is handled in the same way. I'm now close to having a patch that performs just the two base required functions, walking the byte code and calculating the maximum stack depth. I can first use this method in a sanity check to compare against the dynamically calculated maximum stack depth. If not equal there is a bug in my walking code (which also has to handle conditionals). This would then form the basis for continuing on, which would be: 1) identifying points to split the code. Here my current thoughts is that there are two cases: a) the zero stack depth (list of statements) b) non-zero stack depth, here I think split points have to be instructions such as method calls and putfields where it's easy(ish) to identify the earliest point in the byte code that the method call depends on, so leading to a block of code that can be pushed into the sub-method (like above). 2) Pushing blocks of code into sub-methods. Just have some ideas here, detecting if parameters are called in the block to be moved, if not then no need to push the parameters in the sub-method. This approach also requires that the byte code compiler is able to build a method in memory bigger than the 64k limit as the first step, which then can be split after analysis into sub-methods. Luckily this is exactly what it does today :-) Though handling conditional blocks is an issue here. I think once this is working this and a couple of other minor improvements will bump the limit on SQL statement complexity higher than any application is likely to require. Once these multiple methods work the next big jump would be to support multiple classes per statement, but I'm not sure it's worth the effort. One comment on the risk or quality, I code such changes so that they can be invoked at a lower threshold than the optimal solution, for example splitting to leave methods with a maximum size of 1k. Then I run derbyall with the limit lowered to ensure the code is exercised. I did this for the existing sub-methods, splitting at some lower value than 50k and with the recent change for conditionals, invoking the re-writes at jump offsets of greater than 50 bytes instead of 32767. Otherwise these changes are most likely not exercised by the regular tests, though the largeCodeGen test is a start here. But running with derbyall and lowered limits gives a greater chance of finding bugs in the initial implementation. Sorry if this a collection of unfocussed thoughts. Dan.
