On Mon, Apr 21, 2008 at 10:19:08AM -0700, chromatic wrote:
> I'm still exploring the Rakudo build progress as a profiling target for
> likely
> optimizations. After this weekend's work, I have src/gen_actions.pir
> generation down to 27,788,055,796 instructions (with an optimized Parrot). A
> big chunk of that time goes to support bsr_ic:
>
> 7,784,136,854 core.ops:Parrot_bsr_ic
> 7,775,231,886 stacks.c:stack_push
> 7,763,569,145 stack_common.c:stack_prepare_push
> 7,754,735,042 stack_common.c:cst_new_stack_chunk
>
> Why is it expensive? *Every* call to cst_new_stack_chunk() requests a free
> bufferlike object from the GC. 98% of the inclusive cost of these four
> functions is in running the GC.
>
> Someone who's familiar with the stack code (or wants to be) might be able to
> find a big optimization here.
To me, the scary part of src/stacks.c is at the beginning:
The stack is stored as a linked list of chunks (C<Stack_Chunk>),
where each chunk has room for one entry.
Eek! For something like bsr_ic, which is really just pushing a
return address (i.e., opcode_t *) onto a stack, we're allocating a
new GC-able object for every bsr invocation, and freeing it on ret.
Since PGE uses bsr/ret for its backtracking, that's a lot of allocations.
In fact, this seems to be the case for everything using the
"generic stack", which AFAICT is the &interp->dynamic_env structure.
So, everything that gets pushed onto this stack (exceptions,
continuations, coroutines, bsr/ret calls) ends up with a separate
gc-able structure (Stack_Chunk) to hold the stack entry.
So, switching PGE from bsr/ret to another control structure doesn't
give us a win here.
I think we'd get a BIG win if we changed the dynamic_env stack to
have an approach similar to ResizableIntegerArray, where we allocate
arrays of Stack_Chunk entries and manage them that way, instead of
a separate allocation per element in the stack.
In looking at the stacks.c code I think I could manage this
if nobody else is interested in playing with it.
> Then again, I remember someone saying at least some parts of the stack code
> should go away, and I'm all for that too.
The parts of the stack code we've discussed eliminating is actually
the user_stack, used for push/pop/saveall/restoreall opcodes. The
bsr/ret opcodes don't use this particular stack. However, I think that
if we eliminate the user_stack, we in fact simplify stacks.c by
a tremendous amount and could get even a bigger win (including eliminating
some unions and other C nastiness).
I think it's a big enough win for PGE and Rakudo that I'll be very
happy to work on both of these items in a branch -- i.e.,
eliminating the user stack ops and converting the remaining
dynamic_env structure to be something far more suitable for
GC. I think the performance gain we'll see will be, well, substantial.
Pm