I added a couple additional quick interpreter optimizations this morning. After running some numbers through, I saw that we're creating roughly 170 times as many InstructionBundle objects as there are unique AST nodes in the system. I knew this was wasteful when I wrote it, but I figured now is a good time to clean it up a bit. In the patched code, when the default InstructionBundle for a given node is created, it is attached to that node from then on, since a given AST node will always associate with a given instruction. This means two things:

- default InstructionBundle instances will only be created once, on first encountering the node
- instructions that are not encountered will never incur the cost of creating an InstructionBundle

With this additional change in place, the numbers improve again:

binary-trees 10
with new fix: 17.275, 17.215, 17.345; 17.278, 16% faster
without any fix (original): 20.68, 20.6, 20.63; 20.636

The 16% faster is an improvement over the previous 10% faster. I have not run the other benchmarks.

The patch is attached, based on a recent CVSROOT dump from SF.NET (since it's STILL down).

I also ran some numbers to see how many nodes are being wasted. The modified interpreter (which is mostly the same as what I wrote last Fall) has the potential to waste nodes in the following situations: when a long method returns early, or when an exception is raised early in a method. The interpreter, upon encountering a block of nodes (at the same depth in the tree) will push all of those nodes onto an instruction stack. The first node to be executed is pushed last. Each node then may result in additional nodes pushed onto the stack, so the stack only ever contains the current branch of the AST and direct siblings of that branch's nodes. The wastefulness, then, comes when a large number of siblings are pushed onto the stack, only to be immediately removed because of a return or an interpreter event.

However, in practice it doesn't appear this happens very often. I turned on some logging to compare executed nodes versus popped nodes, and found that the waste is somewhere under 1% of all InstruuctionBundles created. I would surmise this is because each line in a script constitutes one node at that level (a NewlineNode), and everything else on the line comes into play only when the line is executed. So that's good news...we're not pushing and popping much more than we actually need to be.

On 3/30/06, Charles O Nutter <[EMAIL PROTECTED]> wrote:
I'm getting actual work done today, what with the SF.net CVS being down for over 12 hours now. At any rate, inbetween real work I ran another benchmark:

nestedloop 15
with fix: 55.422 , 55.094; 55.258, 19% faster
without: 68.781, 68.89; 68.8355


On 3/30/06, Charles O Nutter < [EMAIL PROTECTED]> wrote:
Internally, the interpreter creates an InstructionBundle for each node encountered in the AST. This allows it to link an executable instruction, a specific node from the AST, and a couple flags for ensured, breakable, etc into a single nugget. This is then pushed on to an "instruction stack" to maintain the current instruction and to handle nested flow-control structures.

It would seem that the guess that all this stack pushing and popping is a problem is at least partially true. The attached EvaluationState.java makes the InstructionBundle itself into a linked list node, eliminating much of the list management previously required. The results are very positive; in a few algorithms this speeds up JRuby by over 10%.

I ran three tests from the Language Shootout: nsieve, binary-trees, and recursive. The results are below. After a bit more testing (and after SF.NET's project CVS is back online), I will commit this fix, and get back into bugs and enhancements. It is very promising that small changes like this can improve performance so much.

nsieve 1
with fix: 16.25, 16.25, 16.313, 16.187, 16.265; 16.253, 1.8% faster
without fix: 16.578, 16.547, 16.531, 16.579; 16.558

binary-trees 11
with fix: 30.922, 30.719; 30.82, 10% faster
without: 34.359, 34.547; 34.453

recursive 4
with fix: 69.953, 69.578; 69.765, 12% faster
without: 79.906, 79.937; 79.921

--
Charles Oliver Nutter @ headius.blogspot.com
JRuby Developer @ jruby.sourceforge.net
Application Architect @ www.ventera.com




--
Charles Oliver Nutter @ headius.blogspot.com
JRuby Developer @ jruby.sourceforge.net
Application Architect @ www.ventera.com



--
Charles Oliver Nutter @ headius.blogspot.com
JRuby Developer @ jruby.sourceforge.net
Application Architect @ www.ventera.com

Attachment: instructionbundle_speedup3.patch
Description: Binary data

Reply via email to