Code generation into multiple methods to avoid 64k code limit

Daniel John Debrunner Fri, 03 Feb 2006 22:09:16 -0800

<long post on improvements in Derby's code generation follows>

I working on DERBY-176/DERBY-766 to increase the size of queries that
Derby can handle, and yes the examples in the largeCodeGen test are
based upon real customer examples.


http://issues.apache.org/jira/browse/DERBY-766
http://issues.apache.org/jira/browse/DERBY-176

There is a dump of my thoughts on my current direction.

Code generation on large queries is running into the per-method code
limit of 64k from the JVM spec. Currently Derby already overflows into
multiple methods in two ways:

1) at the caller level, ie. the SQL compiler. There are a couple of
approaches to this, one is counting statements and switch to a new
method at 2,000 statements, the other is always generate sections in a
sub-method and call it from the primary method.
The trouble here is that it's very much reactive, we see a failure to
generate code when a query has lots of X and so we only adjust the code
that generates X.

2) at Derby's byte code compiler level. This is the best as the caller
is unaware of it. The current approach is to start looking for places to
split the method when it's size is approaching the limit of 64k. So
after around 50k bytes of code the compiler will try to split the method
if it can. This leads to a potential calling sequence like

  // method returned to caller (ie. sql compiler)
  public Object e0()
  {
     // ~50k of code
     return e0_1();
  }

  private Object e0_1()
  {
     // ~50k of code
     return e0_2();
  }

  private Object e0_2()
  {
     // remaining code
     // note the stack will be N frames deeper
     // depending on the number of sub-methods we
     // had to generate, than if we had a single large method.
     return result;
  }

The code starts looking at around 50k because in its current form there
are limited places the split can occur, and since it may take a large
number of instructions to get to a split point, the code is
conservative. One criteria is that the method's current stack depth must
be zero in order to split, this simplifies a lot, no need to pass
arguments to the sub methods, and concerns about future instructions
requirement on values on the stack. Another current criteria is that the
method has no parameters, this is actually an easy one to fix, but it
doesn't benefit Derby much since the generated methods that take
parameters tend not to have the stack depth dropping to zero, so they
can't be split. One other criteria is the split point can not be in the
middle of a conditional block, that just solves a host of tricky issues.

If you mapped the methods that can be split currently to Java language
code then they would look like a list of independent statements. E.g.

   public void bigMethod()
   {
      a();
      b(3);
      c(d());
      f = e();
      ...
   }

One can see that if you were coding in the Java language and the method
got to big this would be easy to split, but you wouldn't do it like the
nested approach above, instead you would do this for the above e0 example.

  public Object e0()
  {
     // ~64k of code
     e0_1();
     return e0_2();
  }

  private void e0_1()
  {
     // ~64k of code
  }

  private Object e0_2()
  {
     // remaining code
     // note the stack will be only one frame deeper
     // than if we had a single large method.
     return result;
  }

This is better for two reasons:
   - The stack is only ever one frame deeper than a single big method.
With the current nested approach it is N frames deeper.

   - There is no need to split early, you can use each method to its
maximum capacity, leading to potentially fewer methods, e.g. 7 sub-
methods instead of 8.

The other type of method that Derby generates and currently can not be
split would look like this in the Java language.

public Object e0()
{
  return a(b(
              c(d(),e(),f())
              ,e()
              ,k()
             )
             ,o(), j(),e()
           );
}

That is, deeply nested method calls, hence the stack depth never
dropping to zero. These method calls come from the creation of internal
language result sets to execute the query. These result sets are created
through factory methods, some of which have up to 26 arguments. And some
of those arguments will be deep nested calls themselves. So with a
select with 5000 UNIONS there will be probably tens of thousands of
result sets generated with maybe one hundred thousand parameters in total.

If you were going to split that in java language, you would take groups
of method calls and have those in a separate method, something like.

public Object e0()
{
  return a(e0_1(), o(), j(),e());

}

private Object e0_1()
{
   return b(e0_2() ,e() ,k());
}

private Object e0_2()
{
   return c(d(),e(),f());
}


So, I'm trying to generalize splitting a method into multiple methods to
be contained within the compiler and eventually remove the external
places where method splitting is happening. Based on my current thinking
I'm going to try an take a "post compilation" approach. The current
internal approach occurs as the byte code for the method is being built
up. But after thinking about this for a long time I believe the post
method completion is the correct approach. This has the advantages
listed above, reduced stack frames and fewer methods. In some cases we
would have a single method where today we split around the 50k limit.

One key requirement is for the ability to understand the structure of
already generated byte code. The current compiler just builds the
instructions as it goes along, there's no code to be able to walk the
instruction stream once it is generated. This is, understanding how many
bytes make up each instruction so that one can move through the
underlying byte array in the correct sequence.

The next requirement is to be able to calculate the maximum stack depth
for any arbitrary set of byte code so that it can be pushed into a
sub-method. Today the maximum stack depth is calculated as the
instruction stream is built, and as currently we split methods at the
current instruction and then continue building in the new method the max
stack calculation for the new method is handled in the same way.

I'm now close to having a patch that performs just the two base required
functions, walking the byte code and calculating the maximum stack
depth.  I can first use this method in a sanity check to compare against
the dynamically calculated maximum stack depth. If not equal there is a
bug in my walking code (which also has to handle conditionals).

This would then form the basis for continuing on, which would be:

1) identifying points to split the code. Here my current thoughts is
that there are two cases:
     a) the zero stack depth (list of statements)
     b) non-zero stack depth, here I think split points have to be
instructions such as method calls and putfields where it's easy(ish) to
identify the earliest point in the byte code that the method call
depends on, so leading to a block of code that can be pushed into the
sub-method (like above).

2) Pushing blocks of code into sub-methods. Just have some ideas here,
detecting if parameters are called in the block to be moved, if not then
no need to push the parameters in the sub-method.

This approach also requires that the byte code compiler is able to build
a method in memory bigger than the 64k limit as the first step, which
then can be split after analysis into sub-methods. Luckily this is
exactly what it does today :-) Though handling conditional blocks is an
issue here.

I think once this is working this and a couple of other minor
improvements will bump the limit on SQL statement complexity higher than
any application is likely to require. Once these multiple methods work
the next big jump would be to support multiple classes per statement,
but I'm not sure it's worth the effort.

One comment on the risk or quality, I code such changes so that they can
be invoked at a lower threshold than the optimal solution, for example
splitting to leave methods with a maximum size of 1k. Then I run
derbyall with the limit lowered to ensure the code is exercised. I did
this for the existing sub-methods, splitting at some lower value than
50k and with the recent change for conditionals, invoking the re-writes
at jump offsets of greater than 50 bytes instead of 32767. Otherwise
these changes are most likely not exercised by the regular tests, though
the largeCodeGen test is a start here. But running with derbyall and
lowered limits gives a greater chance of finding bugs in the initial
implementation.

Sorry if this a collection of unfocussed thoughts.
Dan.

Code generation into multiple methods to avoid 64k code limit

Reply via email to