Re: Register Allocation and Continuations problem definition

2005-06-03 Thread Bill Coffman
I hope that no one misunderstands when I say this. It is not meant to be a 
criticism, but rather it is my understanding of the problem, put very 
directly. I may be wrong, and in many ways, I hope am.

There is a fundamental flaw in the Parrot architecture. In it's current 
form, continuations and register allocation cannot coexist. The solution is 
to do away with the register allocator, or to do away with continuations. 
This can be done, perhaps by saying that certain functions won't be subject 
to continuation snapshots or calls. Then register allocation can be done on 
those routines. Or compiling the whole program with a switch of on or off.

When I say do away with the register allocator, I mean that nearly all 
registers will interfere, so essentially each variable gets a register. The 
extensible register stack is the best way to implement this strategy. If 
there are long sections of code with lots of temporary variables, then 
register allocation might be of some use. But most variables will interfere. 
Implementing this would be an interesting project. It would be quite 
different from a standard register allocator. It might even be worth writing 
a paper on the techniques involved.

Turning off full continuations, is a another matter. I don't know what 
impact that would have on parrot. It would be relatively straightforward to 
write the allocator with this approach.

There are other issues involved, regarding how registers change when called. 
What is preserved, and how. Matt touched on some of these issues. My main 
concearn in attempting to implement the register allocator was to find the 
minimal control flow graph (cfg) which helps to provide the variable 
interference graph. 

This is my understanding of the problem. Matt wants this thread to be a 
problem definition, so these thoughts seem appropriate for this thread. I 
hope no one is offended by my bluntness, but if this problem is to be 
solved, it must be understood.

Bill Coffman

ps: a couple important threads on this issue were
http://groups-beta.google.com/group/perl.perl6.internals/browse_frm/thread/3a61fe5e97d17389/0603ff13ca52f0ff?_done=%2Fgroup%2Fperl.perl6.internals%3F_doneTitle=Back+to+topics_doneTitle=Backd#0603ff13ca52f0ff
http://groups-beta.google.com/group/perl.perl6.internals/browse_frm/thread/214157bf29879710/4c15aa4a1f56c20a?_done=%2Fgroup%2Fperl.perl6.internals%3F_doneTitle=Back+to+topics_doneTitle=Backd#4c15aa4a1f56c20a


On 6/3/05, Matt Fowles [EMAIL PROTECTED] wrote:
 
 All~
 
 On 6/3/05, Bill Coffman [EMAIL PROTECTED] wrote:
 
  There are several threads in the parrot mailing list that discuss the
  continuations problem. I hoped to be able to address parrot register
  allocator again, at some point, but it could be a while before I get to
  that. In the mean time, search the perl6-internals mailing list for
  continuations and register allocation. I'm sure you'll find it.
 
 I would actually very much like to see this issue moved on. In the
 hopes of getting everything up to speed quickly I will try and
 summarize all of the discussions and suggestions that have been made.
 I am NOT trying to advocate any particular solution (yet), I just want
 to make sure that everyone is on the same page.
 
 I propose that we use this thread to ensure that we are not talking
 past each other. Discussions about specific solutions should and
 religion should have a seperate thread.
 
 
 PROBLEM:
 
 Continuations can be invoked repeatedly. Whenever a continuation is
 reinvoked, the registers must be in a defined state or the code will
 behave incorrectly.
 
 
 BASIC ANALYSIS
 
 The actual state of the registers does not matter as long as it is
 well defined.
 
 A) If the registers are defined to be garbage, then every continuation
 will start by returning them to the state it expects.
 
 B) If the registers are defined to be preserved, then every
 continuation merrily chugs along.
 
 Unfortunately these each have issues and as such much debate ensued.
 
 
 SUGGESTED SOLUTIONS
 
 1) Context includes registers
 
 + continuations don't have to preserve specifc registers
 + register allocator can ignore continuations wrt spilling
 - copying overheard
 - value based registers (IN) return to old values
 - pointer based registers (SP) cannot have their pointers moved or
 require double indirection
 
 
 2) Return Continuations include registers, non don't
 
 + register allocator can remain mostly ignorant
 + only non return continuations need to worry
 - promoting a return continuation will force copying as plan (1)
 
 
 3) register are not restored
 
 + simple to explain
 - register allocator must add many extra interference edges
 - largely prevents reuse of register
 
 
 4) variable sized register frames
 
 + never have unused registers
 + no need for a register allocator
 + could correspond directly to scratchpad
 - more complicated
 - no register reuse
 - large architectural change
 - more custom allocation (can't pool 

Re: register allocation

2004-11-13 Thread Leopold Toetsch
Bill Coffman [EMAIL PROTECTED] wrote:
 Hello,

 * I have the below failed tests.

 t/library/dumper.t   13  332813   13 100.00%  1-13

These are fixed (assignment collisions in pcc.c)

 t/op/gc.t 1   256181   5.56%  13

Is very similar to the case below - gc_14 (you might read that one
first, it's a bit simpler). The variable Carr2 is assigned to P17. In
gc_13.imc:80 the temp C$P0 is also getting this register.  Now when
the algorithm backtracks over y = choose(arr2) arr2 in P17 is
clobbered and we get shift_pmc() not implemented in class 'Closure'.
(You can turn on these print statements to see the flow of operations)

There is no sign in the source code, that the control flow from calling
choose() the second time to calling the fail closure might be a branch
target. We basically have a loop (due to the continuation usage), which
the register allocator can't be aware of.

Inserting a label e.g.

  lp:
y = choose(arr2)

and a branch lp after the call to the fail closure hides the
problem, because with that loop, Carr2 is preserved. (arr1 is still
clobbered, but the algorithm doesn't bactrack there again).


 t/pmc/sub.t   1   256781   1.28%  78

tb_loop:
set P17, P16[K16]
^

Somewhere a SymReg with set 'K' has slipped into the register allocator.
Excluding r-set != 'K' in build_reglist():405 fixes that.


Now failing too is t/op/gc_14 (and an exception test, which is the
same). These 2 are a bit tricky and I don't know, which result is
correct - or better there is probably no answer.

We have this recursive method:

  sub b11 method
.param int n
.local int n1
n1 = n + 1
newsub $P0, .Exception_Handler, catch
set_eh $P0
n = self.b11(n1)
clear_eh
  catch:
.pcc_begin_return
.return n
.pcc_end_return
  .end

Cn1 is going into the recursive call, Cn is the return result. Now
it depends on the register allocator which register Cn1 and Cn have.
When the allocator isn't really clever then it assigns different
registers. So Cn is preserved when the exception is fired as the
recursion limit is hit.

The new allocator assigns both Cn1 and Cn to the same Parrot
register. This is technically correct, as the life-span of Cn ends on
the line where it's used as a return result of the recursive method call.

Therefore we have a piece of code that in the presence of exceptions may
behave differently depending on some implementation details. This is
AFAIK very the same, as the gcc warning we had until a few days about a
possibly clobbered variable used around setjmp (s. src/inter_run.c:55,
and the variable offset. (It's a volatile now)

This is totally the same variable usage like above's Cn and the result
depends on register allocation details.

 ~Bill

leo


Re: register allocation

2004-11-12 Thread Leopold Toetsch
Bill Coffman wrote:
Hello,
* I have the below failed tests.  I haven't looked into these.  Can
someone tell me if the tests are broken, or is my allocator broken.  I
know they don't fail for the current cvs code (as of yesterday).
Failed TestStat Wstat Total Fail  Failed  List of Failed
---
t/library/dumper.t   13  332813   13 100.00%  1-13
t/op/gc.t 1   256181   5.56%  13
t/pmc/sub.t   1   256781   1.28%  78
These errors look remarkable the same, when I turn on OPT_SUB that is, 
when allocate_wanted_regs() is used. And this code did miss registers 
sets like 'K' (compound keys).

Changing imcc/reg_alloc.c:890 to:
  if (r-color = 0 || r-want_regno == -1 || strchr(ISPN, r-set 
== 0))

did fix this flaw.
leo


Re: register allocation

2004-11-12 Thread Bill Coffman
Interesting.  I did a little more research, and checked out a fresh
CVS tree.  I did nothing more than insert a conflict checker into
reg_alloc.c  That is, whatever pre-colored registers have been
assigned to symbols, I verify that none of them are conflicting. 
Guess what I got?

make[1]: Leaving directory `/usr/home/coffman/dev/parrot/parrot/docs'
./parrot -o runtime/parrot/library/Data/Dumper/Base.pbc
runtime/parrot/library/Data/Dumper/Base.imc
**  added new here
node 1 = defname(S) is colored 5 and neighbor 9 = S5(S) is colored 5
parrot: imcc/reg_alloc.c:200: imc_reg_alloc: Assertion `r-color==-1
|| r-color != unit-reglist[y]-color' failed.
make: *** [runtime/parrot/library/Data/Dumper/Base.pbc] Aborted

The routine you mentioned, allocate_wanted_regs() doesn't assign
wanted regs if in conflict, but as far as I can tell, there's no check
on the integrity of precolored nodes.  Not quite sure what the correct
behaviour is for allocator, when it's given conflicts.

I saw 'K' compounded key somewhere, but don't know what it is.  I know
reg_alloc.c pretty well though.  Offhand I'd say it's not handled in
reg_alloc.  Maybe it should be?

~Bill

On Fri, 12 Nov 2004 10:26:51 +0100, Leopold Toetsch [EMAIL PROTECTED] wrote:
 Bill Coffman wrote:
  Hello,
  * I have the below failed tests.  I haven't looked into these.  Can
  someone tell me if the tests are broken, or is my allocator broken.  I
  know they don't fail for the current cvs code (as of yesterday).
 
  Failed TestStat Wstat Total Fail  Failed  List of Failed
  ---
  t/library/dumper.t   13  332813   13 100.00%  1-13
  t/op/gc.t 1   256181   5.56%  13
  t/pmc/sub.t   1   256781   1.28%  78
 
 These errors look remarkable the same, when I turn on OPT_SUB that is,
 when allocate_wanted_regs() is used. And this code did miss registers
 sets like 'K' (compound keys).
 
 Changing imcc/reg_alloc.c:890 to:
 
if (r-color = 0 || r-want_regno == -1 || strchr(ISPN, r-set
 == 0))
 
 did fix this flaw.
 
 leo
 



Re: register allocation

2004-11-12 Thread Bill Coffman
 These errors look remarkable the same, when I turn on OPT_SUB that is,
 when allocate_wanted_regs() is used. And this code did miss registers
 sets like 'K' (compound keys).
 
 Changing imcc/reg_alloc.c:890 to:
 
if (r-color = 0 || r-want_regno == -1 || strchr(ISPN, r-set
 == 0))
 
 did fix this flaw.
 
 leo

You're patch looks suspiciously like a typo.  strchr(ISPN, r-set == 0))
will always returns true if r-set != 0.  This means that the patch
effectively disables allocate_wanted_regs(), unless there are certain
cases where r-set==0, which I doubt.

Try sticking this toward the beginning of imc_reg_alloc(), but after
build_interference_graph, like imcc/reg_alloc.c:187 (or so)

#ifndef NDEBUG
{
int x,y;
for (x = 0; x  unit-n_symbols; x++) {
SymReg* r = unit-reglist[x];
for (y = 0; y  unit-n_symbols; y++) {
if (ig_test(x, y, unit-n_symbols, unit-interference_graph)) {
if(r-color!=-1  r-color == unit-reglist[y]-color)
fprintf(stderr,node %d = %s(%c) is colored %d and 
neighbor %d = %s(%c) is colored %d\n,
x,r-name,r-set,r-color, 
y,unit-reglist[y]-name,unit-reglist[y]-set,
unit-reglist[y]-color);
assert(r-color==-1 || r-color != unit-reglist[y]-color);
}
}
}
}
#endif

And before ...
while (todo) {


Re: Register allocation/spilling and register volitility

2004-11-12 Thread Bill Coffman
On Mon, 8 Nov 2004 12:20:19 +0100, Leopold Toetsch [EMAIL PROTECTED] wrote:
 Jeff Clites [EMAIL PROTECTED] wrote:
  On Nov 8, 2004, at 1:34 AM, Leopold Toetsch wrote:
 
  If frames aren't adjacent, normal argument copying can be done anyway.
 
  This would seem to require the same types of runtime checks that you
  are objecting to below,
 
 Not runtime. The register allocator knows the amount of needed
 non-volatiles and volatile registers. Depending on that a call can place
 arguments directly into the incoming regs of the caller or not. Then if
 at runtime copying is needed (because e.g. frames aren't adjacent), the
 function arguments are copyied. This is fully transparent.
 
  I discussed that in item (1) under Further notes.
 
 Yep, saw that then ;)
 
  It's not needed. I've a better scheme in mind, which addressess
  efficieny as well as argument passing.
 
  And spilling?
 
 Well, I'm proposing a variable-sized register frame. With very little
 additions we could run with more then 32 registers per kind (there are a
 few bitmasks currently that would need adaptions, but not much).

I think this is a great idea.  Mostly, I was thinking of the case
where there are less than 32 registers needed.  The allocator can try
to reduce the number of registers that will need to be used.  I think
that addressing could become an issue is there are too many registers,
but I still haven't figured out how the bytecode looks yet.

 But basically, spilling should not be needed at all, if the register
 allocator isn't as broken as it now is. Dan got some really evil
 subroutines, so we have RL test cases. We'll see.

In that case, I'll focus on the register renaming, live-range
analysis, etc.  The memory leak, which might not be actually lost
memory, but just wasteful useage of memory, is pretty much related to
the spill code.  Most of the grotesqueness of the allocator is in the
spiller.  If this has a high probability of changing, I'll focus on
other stuff, like register renaming, which is a lot harder, but with
higher longterm payoff.

In light of this, I think I'll send in my patch, in case it is
helpful.  The problem is that some routine outside of imc_reg_alloc()
is sending in conflicting register preallocation.  My code does lot's
of assertions, and finds this, one way or another.  I'll put in a
variable to turn off color consistency checking, which will probably
then pass the tests.

Bill

 
  JEff
 
 leo



Re: Register allocation/spilling and register volitility

2004-11-12 Thread Leopold Toetsch
Bill Coffman wrote:
In that case, I'll focus on the register renaming, live-range
analysis, etc.  
Great.
In light of this, I think I'll send in my patch, in case it is
helpful.  The problem is that some routine outside of imc_reg_alloc()
is sending in conflicting register preallocation. 
Yep. I've already found two flaws within pcc.c. One is fixed (thanks to 
your allocation conflict check).

The second is tricky: we have
   _global_dumper()
   ...
   object.dumper(pmc, name)# args are P5, S5
and _global_dumper() does:
   .return(object)
The function is returning the object in P5, which means that return 
conventions are indicating a PMC result in P5, and that get's copied 
into the caller's frame. But the caller doesn't use it. The usage of P5 
in the caller (pmc) is clobbered...

I don't know, if that is an abuse of prototyped calling conventions (we 
need a definition WRT call/return argument mismatches).

If not, we can never assign possible return values around a function call.
Bill
leo


Re: Register allocation/spilling and register volitility

2004-11-12 Thread Michel Pelletier
 Date: Fri, 12 Nov 2004 04:10:20 -0800
 From: Bill Coffman [EMAIL PROTECTED]

   And spilling?
  
  Well, I'm proposing a variable-sized register frame. With very little
  additions we could run with more then 32 registers per kind (there are a
  few bitmasks currently that would need adaptions, but not much).
 
 I think this is a great idea.  

So do I.  Having a fixed number of registes is like having the C keywords 
register0 through registerN.  All registers should be $Pn registers and let 
the guts deal with how that maps physically.  I certainly don't care which 
$Pn register maps to which Pn register, should a PIR programmer ever care?   
As far as I can tell the only reason why a PIR programmer would even use Pn 
registers is for implicit arguments, leading me into my next point:

I strongly feel that Leo was right to suggest removing implicit register 
arguments from opcodes.  I responded to that email but somehow it didn't get 
to the list AFAICT.  I can't think of any good reason or use case for 
implicit arguments.  Can one be provided?  From my experiences with Parakeet, 
it's just more weird, implicit stuff that I have to keep in my head that 
confuses new PIR programmers who look at my code.

Since I'm on the box I might as well get it all out: I don't think 
unprototyped pcc calls should use the first 11 arguments in P registers and 
the rest in an array, again it just doesn't make any sense to me other than 
to violate DON (Don't Optimize Now, in tribute to Don Knuth).  Just pass an 
array and be done with it.  If the guts want to register-optimize arguments 
then fine, why does the PIR programmer have to do it by hand? (this is really 
an specific extension of the argument against implicit register usage)

Unprototyped arguments are usually built dynamically at run time anyway or 
called in from an external language.  I wager the performance benefit is 
miniscule to nothing compared to the contortious PIR code needed to pack and 
unpack arguments into registers, spill arguments into an array, avoid those 
11 P regisers all the time, and the pressure it all puts on the register 
allocator.  For prototyped calls I can (barely) see this as being useful, but 
not unprototyped.

Fixed registers, implicit arguments, and calling contortions are dead 
chickens (coined, AFAIK, by Jim Fulton).   They're the completely useless 
things you have to wave over the machine to get it to do what you want.  The 
less dead chickens you have to wave over Parrot, the better.

-Michel


Re: Register allocation/spilling and register volitility

2004-11-12 Thread Dan Sugalski
At 10:51 AM -0800 11/12/04, Michel Pelletier wrote:
  Date: Fri, 12 Nov 2004 04:10:20 -0800
 From: Bill Coffman [EMAIL PROTECTED]

   And spilling?
 
  Well, I'm proposing a variable-sized register frame. With very little
  additions we could run with more then 32 registers per kind (there are a
  few bitmasks currently that would need adaptions, but not much).
 I think this is a great idea. 
So do I.
I don't. It's staying the way it is, as are the implicit operands to 
some of the ops, and the unprototyped calling conventions.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: register allocation

2004-11-11 Thread Leopold Toetsch
Bill Coffman [EMAIL PROTECTED] wrote:
 Hello,

 I have addressed almost all the code issues from the last patch.

 * I have the below failed tests.

 Failed TestStat Wstat Total Fail  Failed  List of Failed
 ---
 t/library/dumper.t   13  332813   13 100.00%  1-13
 t/op/gc.t 1   256181   5.56%  13
 t/pmc/sub.t   1   256781   1.28%  78

These tests should be sane. While the first two .t are fairly complex,
t/pmc/sub_78.imc is quite simple. You could compare the produced code
of parrot -o- foo.imc. Run it with -t to see, where output differs.

 * Still have a memory leak that blocks Dan's evil code from compiling.

$ valgrind --leak-check=yes --show-reachable=yes ./parrot --leak-test foo.imc

(there are some other leaks, so just have a look at increasing leak
counts for increased symbol usage)

 * Run time isn't so great,

We'll address that later.

Please post patch.

leo


Re: Register allocation/spilling and register volitility

2004-11-08 Thread Leopold Toetsch
Jeff Clites [EMAIL PROTECTED] wrote:

 OTOH it doesn't really matter, if the context structure is in the
 frame too. We'd just need to skip that gap. REG_INT(64) or I64 is as
 valid as I0 or I4, as long as it's assured, that it's exactly
 addressing the incoming argument area of the called function.

 A problem with this is that you can't be sure that you can actually
 have the next frame of registers adjacent to the current frame--they
 might already be taken. Imagine A calls B, then B creates a
 continuation and stores it in a global, then returns.

Please read the proposal summary by Miroslav Silovic, keyword watermark.
If frames aren't adjacent, normal argument copying can be done anyway.

 Keep the old-scheme registers inside the interpreter structure, *as
 well as* the new indirect registers. Call the registers in the
 interpreter I0 to I31, and the indirect registers I32 to whatever.

That would need two different addressing modes depending on the register
number. That'll lead to considerable code bloat: we'd have all possible
permutations for direct/indirect registers. Doing it at runtime only
would be a serious slowdown.

It's not needed. I've a better scheme in mind, which addressess
efficieny as well as argument passing.

leo


Re: Register allocation/spilling and register volitility

2004-11-08 Thread Jeff Clites
On Nov 8, 2004, at 1:34 AM, Leopold Toetsch wrote:
Jeff Clites [EMAIL PROTECTED] wrote:
OTOH it doesn't really matter, if the context structure is in the
frame too. We'd just need to skip that gap. REG_INT(64) or I64 is as
valid as I0 or I4, as long as it's assured, that it's exactly
addressing the incoming argument area of the called function.

A problem with this is that you can't be sure that you can actually
have the next frame of registers adjacent to the current frame--they
might already be taken. Imagine A calls B, then B creates a
continuation and stores it in a global, then returns.
Please read the proposal summary by Miroslav Silovic, keyword 
watermark.
If frames aren't adjacent, normal argument copying can be done anyway.
This would seem to require the same types of runtime checks that you 
are objecting to below, unless user code is expected to explicitly 
check whether it's supposed to be assigning to I3 v I(3 + 32x?). And 
that still seems to require copying, in my case of a function a() which 
calls a function b() with the exact same arguments.

Keep the old-scheme registers inside the interpreter structure, *as
well as* the new indirect registers. Call the registers in the
interpreter I0 to I31, and the indirect registers I32 to whatever.
That would need two different addressing modes depending on the 
register
number. That'll lead to considerable code bloat: we'd have all possible
permutations for direct/indirect registers. Doing it at runtime only
would be a serious slowdown.
I discussed that in item (1) under Further notes.
It's not needed. I've a better scheme in mind, which addressess
efficieny as well as argument passing.
And spilling?
JEff


Re: Register allocation/spilling and register volitility

2004-11-08 Thread Leopold Toetsch
Jeff Clites [EMAIL PROTECTED] wrote:
 On Nov 8, 2004, at 1:34 AM, Leopold Toetsch wrote:

 If frames aren't adjacent, normal argument copying can be done anyway.

 This would seem to require the same types of runtime checks that you
 are objecting to below,

Not runtime. The register allocator knows the amount of needed
non-volatiles and volatile registers. Depending on that a call can place
arguments directly into the incoming regs of the caller or not. Then if
at runtime copying is needed (because e.g. frames aren't adjacent), the
function arguments are copyied. This is fully transparent.

 I discussed that in item (1) under Further notes.

Yep, saw that then ;)

 It's not needed. I've a better scheme in mind, which addressess
 efficieny as well as argument passing.

 And spilling?

Well, I'm proposing a variable-sized register frame. With very little
additions we could run with more then 32 registers per kind (there are a
few bitmasks currently that would need adaptions, but not much).

But basically, spilling should not be needed at all, if the register
allocator isn't as broken as it now is. Dan got some really evil
subroutines, so we have RL test cases. We'll see.

 JEff

leo


Re: register allocation questions

2004-10-29 Thread Leopold Toetsch
Bill Coffman [EMAIL PROTECTED] wrote:

 Currently, here's how the register allocator is doing.

 Failed TestStat Wstat Total Fail  Failed  List of Failed
 ---
 t/library/dumper.t5  1280135  38.46%  1-2 5 8 13
 4 tests and 51 subtests skipped.
 Failed 1/123 test scripts, 99.19% okay. 5/1956 subtests failed, 99.74% okay.

 I recall Leo, or someone, saying that the data dumper routines are not
 following the calling convention properly.

I didn't look too close, but it's probably only the entry points:

  .sub _dumper
  _global_dumper()

That's missing C.param statements, so there are none.

 I've learned a lot about how the compiler works at this point, and I'd
 like to contribute more :)

Great. Thanks.

 Would you like a patch?  Should I fix the data dumper routines first?

Definitely - dumper.t tests are currently disabled, don't worry.

 What is all this talk about deferred registers?  What should I do
 next?

deferred registers doesn't make bells ring. What do you mean with
that? - Send patch with explanation of algorithm.

 Yes, I think we are kind of doing this.  It's best to pass the
 registers straight through though.  Like when a variable will be used
 as a parameter, give it the appropriate reg num.  Sort of outside the
 immediate scope of register coloring, but as I've learned, one must go
 a little beyond, to see the input and output for each sub.

Well, it's not really outside of register coloring. It's part of
parrot's calling conventions. You can think of it as part of the Parrot
machine ABI. When you write a compiler for darwin-PPC, you have to pass
function arguments in r3, r4, ... and you get a return value in r3. If
you don't do that, you'll not be able to make any C library call.

In Parrot we have similar calling conventions and the register allocator
must be aware of that. E.g. when you have:

some_function()   # (i, j) = some_function()
$I0 = I5
$I1 = I6

you know that I5 and I6 are return results. The live range or the
previous usage of I5 and I6 is cut by the function call.

Using the return values directly is of course an optimization and not
strictly necessary, nethertheless the allocator has to be aware that the
function call invalidates previous I5 and I6.

 But the idea is to have each sub declare how many registers to
 save/restore.

Don't worry about save/restore. That's already changed. imcc doesn't
emit any savetop/restoretop or similar opcodes any more. Registers are
preserved now by allocating a new register frame for the subroutine.

 We can also minimize this number to match the physical architecture
 that parrot is running on (for an arch specific optimization).

Yes. I did that some time ago in imcc/jit.c, which produced register
mapping for the underlying hardware CPU. Parrot registers 0.. n-1 were
given negative numbers and src/jit.c used these directly as mapping for
CPU registers. This vastly reduced JIT startup time.

 Yes, yes, renaming!  I want to do register renaming!

Go for it please.

 p31 holds all the spill stuff.  It's a pain.  Maybe I'll move that
 around, but if p31 is used, it means that there is no more room for
 symbols, in at least one of the reg sets.

I'd say that with register renaming, spilling will be very rare. But
there is of course no need to use P31 for it. If we really have to spill
we can optimize that a bit.

 - Bill Coffman

leo


Re: register allocation questions

2004-10-29 Thread Leopold Toetsch
Leopold Toetsch [EMAIL PROTECTED] wrote:
 Bill Coffman [EMAIL PROTECTED] wrote:

 t/library/dumper.t5  1280135  38.46%  1-2 5 8 13

 I didn't look too close, but it's probably only the entry points:

   .sub _dumper
   _global_dumper()

Fixed.

leo


Re: register allocation questions

2004-10-28 Thread Dan Sugalski
At 9:36 PM +0200 10/27/04, Leopold Toetsch wrote:
Dan Sugalski wrote:
At 11:09 AM +0200 10/26/04, Leopold Toetsch wrote:

So, if you want that really super efficient, you would allocate
registers around function calls directly to that wanted register number,
which should be in the SymReg's want_regno.
While true, in the general case leaving 0-15 as non-preferred 
registers will probably make things easier. Those registers, 
especially the PMC ones, are going to see a lot of thrash as 
function calls are made, and it'll probably be easier to have them 
as scratch registers.
Yep, that's the easy part ;) OTOH when the register allocator is 
doing register renaming anyway, the most inner loop with a function 
call should get registers assigned already matching the calling 
convemtions. With more then one call at that loop level, you have to 
move around registers anyway.
Oh, sure, but keeping your scratch PMCs out of the way makes life a 
lot easier for the register coloring algorithms. Might not be 
optimal, but if it makes life simpler to start, optimal can come 
later.

It's distinctly possible, of course, that there'll be very little 
pressure to actually *use* them for most code, as we've got plenty 
of registers in general. That's the hope, at least.
Yes, 16 regs are plenty and do suffice for all normal[1] code. 
Assigning to wanted reg numbers for a function is a nice 
optimization.

[1] all except Dan's 6000 lines subroutines :) Did you start 
creating real subs for your code already?
I wish. :( Unfortunately not, outside some simple stuff, and I doubt 
I will. The language just doesn't lend itself to that sort of thing. 
We're going to add actual real subroutines to the language after we 
roll out into production, but that doesn't help now, alas.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: register allocation questions

2004-10-28 Thread Bill Coffman
Hi all,

Thanks for your continued comments.  Btw, I usually read all the
parrot list, so don't think I'm not paying attention.

Currently, here's how the register allocator is doing.

Failed TestStat Wstat Total Fail  Failed  List of Failed
---
t/library/dumper.t5  1280135  38.46%  1-2 5 8 13
4 tests and 51 subtests skipped.
Failed 1/123 test scripts, 99.19% okay. 5/1956 subtests failed, 99.74% okay.

I recall Leo, or someone, saying that the data dumper routines are not
following the calling convention properly.  So I've decided not to
worry about it too much.  It passes the other tests, plus the
randomized tests that I created, up to 150 symbols.  At that range, it
still takes about 20x longer than g++ -O2, for equivalent programs to
compile (see gen4.pl).

Also, it is currently running about O(n^2) for n symbols, where the
old one was running about O(n^3) from my analysis.  The spill code is
still very expensive, and has a large constant associate.  I also have
data, which is attached.  The difference doesn't show up until a lot
of spilling is going on, around 80 symbols or so.

I've learned a lot about how the compiler works at this point, and I'd
like to contribute more :)

Would you like a patch?  Should I fix the data dumper routines first? 
What is all this talk about deferred registers?  What should I do
next?

Well, I'm making some comments on the below stuff.

On Thu, 28 Oct 2004 09:07:05 -0400, Dan Sugalski [EMAIL PROTECTED] wrote:
 At 9:36 PM +0200 10/27/04, Leopold Toetsch wrote:
 Dan Sugalski wrote:
 At 11:09 AM +0200 10/26/04, Leopold Toetsch wrote:
 
 So, if you want that really super efficient, you would allocate
 registers around function calls directly to that wanted register number,
 which should be in the SymReg's want_regno.

Yes, I think we are kind of doing this.  It's best to pass the
registers straight through though.  Like when a variable will be used
as a parameter, give it the appropriate reg num.  Sort of outside the
immediate scope of register coloring, but as I've learned, one must go
a little beyond, to see the input and output for each sub.

 While true, in the general case leaving 0-15 as non-preferred
 registers will probably make things easier. Those registers,
 especially the PMC ones, are going to see a lot of thrash as
 function calls are made, and it'll probably be easier to have them
 as scratch registers.

I guess I don't agree.  I'd like to pack down the number of registers
used to a minimum.  Then when a function is called, only those needed
registers are copied in/out.  Don't think the functionality exists. 
But the idea is to have each sub declare how many registers to
save/restore.  This would then save 0-k such registers.  Where k is
the number of registers used by the sub.  Pack 'em down, minimize the
number needed.

We can also minimize this number to match the physical architecture
that parrot is running on (for an arch specific optimization).  The
imc_reg_alloc function does not have 32 hard coded in there (well a
little bit, but can be easily changed).  It's pretty dynamic.

 Yep, that's the easy part ;) OTOH when the register allocator is
 doing register renaming anyway, the most inner loop with a function
 call should get registers assigned already matching the calling
 convemtions. With more then one call at that loop level, you have to
 move around registers anyway.

Yes, yes, renaming!  I want to do register renaming!

 Oh, sure, but keeping your scratch PMCs out of the way makes life a
 lot easier for the register coloring algorithms. Might not be
 optimal, but if it makes life simpler to start, optimal can come
 later.

p31 holds all the spill stuff.  It's a pain.  Maybe I'll move that
around, but if p31 is used, it means that there is no more room for
symbols, in at least one of the reg sets.

 [1] all except Dan's 6000 lines subroutines :) Did you start
 creating real subs for your code already?
 
 I wish. :( Unfortunately not, outside some simple stuff, and I doubt
 I will. The language just doesn't lend itself to that sort of thing.
 We're going to add actual real subroutines to the language after we
 roll out into production, but that doesn't help now, alas.

Interesting.  I'd like to test on something like that.  Maybe SPEC99 as well.

- Bill Coffman


compile.dat
Description: Binary data


compile.plot
Description: Binary data
attachment: compile.png

Re: register allocation questions

2004-10-28 Thread Dan Sugalski
At 3:08 PM -0700 10/28/04, Bill Coffman wrote:
 It passes the other tests, plus the
randomized tests that I created, up to 150 symbols.  At that range, it
still takes about 20x longer than g++ -O2, for equivalent programs to
compile (see gen4.pl).
Still, that's not bad.
Also, it is currently running about O(n^2) for n symbols, where the
old one was running about O(n^3) from my analysis.  The spill code is
still very expensive, and has a large constant associate.  I also have
data, which is attached.  The difference doesn't show up until a lot
of spilling is going on, around 80 symbols or so.
I'm curious to see how it behaves once the spilling gets up into the 
1000+ symbol range. Dropping from cubic to quadratic time ought to 
make a not-insignificant change in the running time, even if that 
constant's pretty big. :)

I've learned a lot about how the compiler works at this point, and I'd
like to contribute more :)
Would you like a patch?
Yes! Oh, yeah, definitely.
Well, I'm making some comments on the below stuff.
On Thu, 28 Oct 2004 09:07:05 -0400, Dan Sugalski [EMAIL PROTECTED] wrote:
 At 9:36 PM +0200 10/27/04, Leopold Toetsch wrote:
 Dan Sugalski wrote:
  At 11:09 AM +0200 10/26/04, Leopold Toetsch wrote:
  While true, in the general case leaving 0-15 as non-preferred
 registers will probably make things easier. Those registers,
 especially the PMC ones, are going to see a lot of thrash as
 function calls are made, and it'll probably be easier to have them
 as scratch registers.
I guess I don't agree.  I'd like to pack down the number of registers
used to a minimum.  Then when a function is called, only those needed
registers are copied in/out.  Don't think the functionality exists.
But the idea is to have each sub declare how many registers to
save/restore.  This would then save 0-k such registers.  Where k is
the number of registers used by the sub.  Pack 'em down, minimize the
number needed.
We can also minimize this number to match the physical architecture
that parrot is running on (for an arch specific optimization).  The
imc_reg_alloc function does not have 32 hard coded in there (well a
little bit, but can be easily changed).  It's pretty dynamic.
By all means, go for it. I certainly don't want to curb your 
enthusiasm. It's the right thing to do, ultimately. I didn't want to 
presume on your time. Happy to have it, of course. :)

  [1] all except Dan's 6000 lines subroutines :) Did you start
 creating real subs for your code already?
 I wish. :( Unfortunately not, outside some simple stuff, and I doubt
 I will. The language just doesn't lend itself to that sort of thing.
 We're going to add actual real subroutines to the language after we
 roll out into production, but that doesn't help now, alas.
Interesting.  I'd like to test on something like that.  Maybe SPEC99 as well.
If you've got a patch, I'd be more than happy to give it a whirl, and 
I can likely get you a copy of the code in question to give a run on.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: register allocation

2004-10-28 Thread Bill Coffman
When I cvs up'd, cleared and reConfigure'd I got these stats:

Failed Test Stat Wstat Total Fail  Failed  List of Failed
---
t/library/streams.t1   256211   4.76%  11
t/op/gc.t  1   256181   5.56%  13
4 tests and 66 subtests skipped.
Failed 2/124 test scripts, 98.39% okay. 2/1957 subtests failed, 99.90% okay.



 I'm curious to see how it behaves once the spilling gets up into the
 1000+ symbol range. Dropping from cubic to quadratic time ought to
 make a not-insignificant change in the running time, even if that
 constant's pretty big. :)

I think it's a bit more complicated.  M=number lines in code, N=number
variables.
- Time = O(M^2+N^2)  
- old time = O(M^2 +N^3)
Not quite sure of this either.  But the N^3 eventually dominates, I
think.  The data seems to bear this out.

There are more fixes I'd like to make as well.  I spotted several
things that could be fixed.  And I think the spill code can be
optimized a lot to reduce the big-O time as well.

More statistics #vars v. time in seconds: 
#vars  gcc  parrot2
200   7.92   89.20
201   11.86   146.31
202   18.11   246.37
203   9.54   107.88
204   11.81   134.60
205   14.75   190.95
206   13.25   161.83
207   10.63   138.83
208   11.02   117.73
209   7.14   88.29
210   15.14   176.69

I am also running gen3.pl with 1000 vars.  It's still on gcc.  We'll
see if parrot doesn't crash my 1Gigabyte, 2.4Ghz workstation tonight.

 Would you like a patch?
 
 Yes! Oh, yeah, definitely.
 [...]
 If you've got a patch, I'd be more than happy to give it a whirl, and
 I can likely get you a copy of the code in question to give a run on.

Soon, I'll send one the proper way.


  The
 imc_reg_alloc function does not have 32 hard coded in there (well a
 little bit, but can be easily changed).  It's pretty dynamic.
 By all means, go for it. I certainly don't want to curb your
 enthusiasm. It's the right thing to do, ultimately. I didn't want to
 presume on your time. Happy to have it, of course. :)

Thanks.  I've had a great time doing this.  Remembering graph
algorithms and compilers.  Great fun!  I'd also like to contribute to
getting Parrot out there, sooner rather than later.  So if I can help
with that, I'd like to hear suggestions.

-Bill


Re: register allocation questions

2004-10-28 Thread Bill Coffman
Thanks Matt,

I hope I can help out.  The patch I am submitting actually does
simplify register coloring a bit.  I've been waiting for perl6 with so
much anticipation, I just couldn't stand it any more, and I had to
participate.

-Bill


On Thu, 28 Oct 2004 18:17:57 -0400, Matt Fowles [EMAIL PROTECTED] wrote:
 Bill~
 
 I have to say that I am really impressed by all of the work that you
 are doing, and if you can make the internals of imcc a little more
 approachable, you would be doing a great service.
 
 Thanks,
 Matt
 
 
 



Re: register allocation questions

2004-10-27 Thread Dan Sugalski
At 11:09 AM +0200 10/26/04, Leopold Toetsch wrote:
Bill Coffman [EMAIL PROTECTED] wrote:
  1)  In the existing parrot code, when a register is assigned, it uses
 the following code:

 int c = (color + MAX_COLOR/2) % MAX_COLOR;

 Thus, it seems to prefer to use register #16 and up, first, before it
 uses 0-15.  This is mistifying to me, since it's not so trivial to
 code it this way,
Well, some time ago, register allocation started at zero. The split of
register frames in upper and lower halfs *plus* the premature
optimization to save only the upper half of registers made it necessary
to allocate from 16 up.
But things are a bit more compilcated. For function calls, we are
passing arguments from register x5 ..x15 and I0..I4 plus some more have
a special meaning. See docs/ppds/pdd03_calling_conventions for all the
details. The same convention is used for function returns.
So, if you want that really super efficient, you would allocate
registers around function calls directly to that wanted register number,
which should be in the SymReg's want_regno.
While true, in the general case leaving 0-15 as non-preferred 
registers will probably make things easier. Those registers, 
especially the PMC ones, are going to see a lot of thrash as function 
calls are made, and it'll probably be easier to have them as scratch 
registers.

It's distinctly possible, of course, that there'll be very little 
pressure to actually *use* them for most code, as we've got plenty of 
registers in general. That's the hope, at least.

--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: register allocation questions

2004-10-27 Thread Leopold Toetsch
Dan Sugalski wrote:
At 11:09 AM +0200 10/26/04, Leopold Toetsch wrote:

So, if you want that really super efficient, you would allocate
registers around function calls directly to that wanted register number,
which should be in the SymReg's want_regno.
While true, in the general case leaving 0-15 as non-preferred registers 
will probably make things easier. Those registers, especially the PMC 
ones, are going to see a lot of thrash as function calls are made, and 
it'll probably be easier to have them as scratch registers.
Yep, that's the easy part ;) OTOH when the register allocator is doing 
register renaming anyway, the most inner loop with a function call 
should get registers assigned already matching the calling convemtions. 
With more then one call at that loop level, you have to move around 
registers anyway.

It's distinctly possible, of course, that there'll be very little 
pressure to actually *use* them for most code, as we've got plenty of 
registers in general. That's the hope, at least.
Yes, 16 regs are plenty and do suffice for all normal[1] code. 
Assigning to wanted reg numbers for a function is a nice optimization.

[1] all except Dan's 6000 lines subroutines :) Did you start creating 
real subs for your code already?

leo


Re: register allocation questions

2004-10-26 Thread Leopold Toetsch
Bill Coffman [EMAIL PROTECTED] wrote:
 Hello All,

 I have been hard at work, trying to grok the reg_alloc.c code, and
 with some success.  My code is assigning registers, so that none are
 conflicting (which I double-verify), and I'm getting to the end of
 make.

Wow.


 1)  In the existing parrot code, when a register is assigned, it uses
 the following code:

 int c = (color + MAX_COLOR/2) % MAX_COLOR;

 Thus, it seems to prefer to use register #16 and up, first, before it
 uses 0-15.  This is mistifying to me, since it's not so trivial to
 code it this way,

Well, some time ago, register allocation started at zero. The split of
register frames in upper and lower halfs *plus* the premature
optimization to save only the upper half of registers made it necessary
to allocate from 16 up.

But things are a bit more compilcated. For function calls, we are
passing arguments from register x5 ..x15 and I0..I4 plus some more have
a special meaning. See docs/ppds/pdd03_calling_conventions for all the
details. The same convention is used for function returns.

So, if you want that really super efficient, you would allocate
registers around function calls directly to that wanted register number,
which should be in the SymReg's want_regno.

 2)  When function imc_reg_alloc (the main register allocation thingy)
 is called, some of the variables have already expressed a preference
 as to which register they want.  I understand that this can optimize
 certain parameter passing, etc.  The question that arrises is, what if
 two conflicting variables want the same color (reg#)?  Obviously, they
 don't get their way.  But what are the consequences, and what must be
 done to fix this.

Ah, ok. For each function call registers are moved around to the
correct number, if they aren't there. This is done by code in
imcc/pcc.c. But the problem still is there that your code really
shouldn't assign these registers in a conflicting way.

 ...  Incidentally, reg
 allocation is done on the following subs:
 _main __library_dumper_onload _dumper _register_dumper _global_dumper
[ ...]

Sure. Register allocation code is done per .sub/.end always. So that's
fine. But you might start with simpler source code having one or two
function calls only.

 Maybe someone can point me at something to look at to fix this.  I'll
 provide the patch if someone would like to play with it.

The main problem probably is to follow register usage in calling
conventions.

BTW: preserving the upper half of registers will be tossed soon. To
simulate this behavior in current CVS, you could run the code with -Oc,
which should preserve all allocated registers. I don't know, when the
indirect register frame patch whill hit CVS, but it should be in a few
days. With that in place, you can allocate registers as you like,
with the caveat that rules in PDD03 are used.

 Thanks to all who made it this far,
 Bill Coffman

leo


Re: register allocation questions

2004-10-25 Thread Dan Sugalski
At 6:18 PM -0700 10/25/04, Bill Coffman wrote:
Hello All,
I have been hard at work, trying to grok the reg_alloc.c code, and
with some success.  My code is assigning registers, so that none are
conflicting (which I double-verify), and I'm getting to the end of
make.
Woohoo! A non-trivial thing. :)
Here's some questions:
1)  In the existing parrot code, when a register is assigned, it uses
the following code:
   for (color = 0; color  MAX_COLOR; color++) {
int c = (color + MAX_COLOR/2) % MAX_COLOR;
if (!colors[c]) {
reglist[x]-color = c;
debug(interpreter, DEBUG_IMC,
#[%s] provisionally gets color [%d]
 (%d free colors, score %d)\n,
reglist[x]-name, c,
free_colors, reglist[x]-score);
break;
}
}
Thus, it seems to prefer to use register #16 and up, first, before it
uses 0-15.  This is mistifying to me, since it's not so trivial to
code it this way, and yet I cannot figure out why this color
preference.  It seems like maybe there's a reason which I need to
understand here.  My new code doesn't do this behaviour.
Registers 0-15 are the registers used for sub/method/function calls. 
Leaving them empty makes calls easier -- if the low-registers are in 
use you need to shuffle the contents somewhere else.

[Snippage]
Looks like we've got some... incidental behavior. This isn't 
particularly good. Feel free to rip out anything you want -- this is 
all internals stuff, much of it's first-generation code, and nobody's 
got any attachment to it. (I hope :)

Incidentally, reg
allocation is done on the following subs:
_main __library_dumper_onload _dumper _register_dumper _global_dumper
__library_data_dumper_onload dumper
__library_data_dumper_default_onload dumpWithName dumpCached
dumpProperties dumpHash dumpStringEscaped pmcDefault pmcPMCArray
pmcIntList pmcStringArray pmcPerlArray pmcPerlHash pmcPerlString
pmcPerlInt pmcPerlNum pmcPerlUndef pmcSub pmcNull pmcOrderedHash
pmcManagedStruct pmcUnManagedStruct pmcArray
__library_data_dumper_base_onload prepare cache createIndent indent
newIndent deleteIndent dump simple String
which seems like a lot to me, but I guess it's compiling a lot of stuff.
That doesn't seem right. Leo may well have an idea of why. On the 
other hand, I'm fine with ripping a lot of this stuff out and 
centralizing it.

--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: register allocation

2004-08-13 Thread Thomas Fjellstrom
On August 9, 2004 02:34 pm, Dan Sugalski wrote:
 At 4:12 PM -0400 8/9/04, Melvin Smith wrote:
 At 11:22 AM 8/9/2004 -0400, Dan Sugalski wrote:
 At 1:13 PM +0200 8/8/04, Leopold Toetsch wrote:
 Leopold Toetsch [EMAIL PROTECTED] wrote:
   You can verify this step by running -v:
 
 $ parrot -v inv_mod.imc 21 | grep symb
 build_reglist: 5783 symbols
 allocate_non_interfering, now: 2205 symbols
 
 It really should help.
 
 It did. I'm not sure how long the build ran (the last time I
 checked it'd racked up over 150 minutes of CPU time on a 2GHz
 dual-processor system, and may have run up to about 180 CPU
 minutes) but it finished.
 
 Now we need to cut down the runtimes just a touch. :)
 
 DSWEPIC
 
 Dan Stop Writing Evil Pathological Intermediate Code.

 Hah! I wish I was writing this. Instead, I'm writing a compiler for
 an Evil Pathological Language. :(

Perl? :)

-- 
Thomas Fjellstrom
[EMAIL PROTECTED]
http://strangesoft.net


Re: register allocation

2004-08-13 Thread Dan Sugalski
At 12:24 AM -0600 8/13/04, Thomas Fjellstrom wrote:
On August 9, 2004 02:34 pm, Dan Sugalski wrote:
 At 4:12 PM -0400 8/9/04, Melvin Smith wrote:
 At 11:22 AM 8/9/2004 -0400, Dan Sugalski wrote:
 At 1:13 PM +0200 8/8/04, Leopold Toetsch wrote:
 Leopold Toetsch [EMAIL PROTECTED] wrote:
   You can verify this step by running -v:
 
 $ parrot -v inv_mod.imc 21 | grep symb
 build_reglist: 5783 symbols
 allocate_non_interfering, now: 2205 symbols
 
 It really should help.
 
 It did. I'm not sure how long the build ran (the last time I
 checked it'd racked up over 150 minutes of CPU time on a 2GHz
 dual-processor system, and may have run up to about 180 CPU
 minutes) but it finished.
 
 Now we need to cut down the runtimes just a touch. :)
 
 DSWEPIC
 
 Dan Stop Writing Evil Pathological Intermediate Code.
 Hah! I wish I was writing this. Instead, I'm writing a compiler for
 an Evil Pathological Language. :(
Perl? :)
You kidding? Perl's got nothing on this... thing. More than all the 
annoyance, with none of the redeeming features, though it does work. 
(On the other hand, I'm quite looking forward to being able to let it 
die in obscurity where it belongs :)
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: register allocation

2004-08-09 Thread Leopold Toetsch
Melvin Smith [EMAIL PROTECTED] wrote:

 Since that is my comment I'll comment.

 When I wrote the allocator, I oriented it around a basic block, and I
 figured a basic block would never have more than a few hundred symbols, so
 the N x N array was the fastest possible data structure for keeping the
 interference data.

The problem is that the matrix isn't built per block but per compilation
unit. And all symbols of all 4 kinds are inside one graph albeit they
can't interfer.

The first thing todo is probably to split the matrix into four pieces.
The whole register allocation could be done in four passes too, but that
adds runtime overhead. OTOH data structures get smaller.

 I'm not sure how many symbols we are talking about here, it would help if
 someone threw out a number before I go and make the wrong statement.

Dan posted some numbers: P 26951, S 2099, I 8878 for evil program. The
program dies because of memory shortage when setting up the register
life information.

 I'm pretty sure that it would help if imcc had two modes of symbol
 analysis. Routine wide register allocation ends up with better quality
 code, but at the cost of compilation time and possible cases that can't be
 handled with the current design. If imcc sees too many symbols, it should
 try to break the sub down into basic blocks for interference analysis, but
 that also adds additional code requirements for when you put the basic
 blocks together and figure out loads, spills and reuse across basic blocks.

Well, I think we have the information to allocate registers per basic
block. It would need some rearrangment of data structures, though.

Anyway, I already have added a first allocation run that allocates all
temporary registers of all blocks in one bunch. This pass doesn't need
life range structures. All these aready allocated registers are not part
of the next computations and are not in the interference gprah any more.

 -Melvin

leo


Re: register allocation

2004-08-09 Thread Dan Sugalski
At 1:13 PM +0200 8/8/04, Leopold Toetsch wrote:
Leopold Toetsch [EMAIL PROTECTED] wrote:
 You can verify this step by running -v:
$ parrot -v inv_mod.imc 21 | grep symb
build_reglist: 5783 symbols
allocate_non_interfering, now: 2205 symbols
It really should help.
It did. I'm not sure how long the build ran (the last time I checked 
it'd racked up over 150 minutes of CPU time on a 2GHz dual-processor 
system, and may have run up to about 180 CPU minutes) but it finished.

Now we need to cut down the runtimes just a touch. :)
--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: register allocation

2004-08-09 Thread Melvin Smith
At 11:22 AM 8/9/2004 -0400, Dan Sugalski wrote:
At 1:13 PM +0200 8/8/04, Leopold Toetsch wrote:
Leopold Toetsch [EMAIL PROTECTED] wrote:
 You can verify this step by running -v:
$ parrot -v inv_mod.imc 21 | grep symb
build_reglist: 5783 symbols
allocate_non_interfering, now: 2205 symbols
It really should help.
It did. I'm not sure how long the build ran (the last time I checked it'd 
racked up over 150 minutes of CPU time on a 2GHz dual-processor system, 
and may have run up to about 180 CPU minutes) but it finished.

Now we need to cut down the runtimes just a touch. :)
DSWEPIC
Dan Stop Writing Evil Pathological Intermediate Code.
-Melvin



Re: register allocation

2004-08-09 Thread chromatic
On Mon, 2004-08-09 at 13:12, Melvin Smith wrote:

 DSWEPIC
 
 Dan Stop Writing Evil Pathological Intermediate Code.

Technically, that should be CWSWCTWEPIC, or Compiler Writers Stop
Writing Compilers That Write Evil Pathological Intermediate Code.

Automating the process of shuddering while reading generated code
certainly helped me sleep more restfully.

-- c



Re: register allocation

2004-08-08 Thread Leopold Toetsch
Dan Sugalski [EMAIL PROTECTED] wrote:

 The code, when I try with -v, gives:

   debug = 0x0
   Reading forms/order.imc
   using optimization '0' (0)
   Starting parse...
   warning:imcc:build_reglist: probably too small HASH_SIZE (29451 symbols)
   error:imcc:make_life_range: Out of mem calloc-ing 12 bytes

 And then dies.

Ok. Too many symbols still. I've now enabled again ALLOCATE_HACK (which
should get renamed, if it works :)
Before the life analysis is started, 3 registers of one kind get
allocated in one pass, if they are in one basic block only and don't
overlap. This should color a lot of the temps immediately. These already
colored temps are then not considered in further computations.

You can verify this step by running -v:

build_reglist: 36 symbols
allocate_non_interfering, now: 12 symbols

This is from a program that does arithmetics with the typical chain of
intermediate temps.

leo


Re: register allocation

2004-08-08 Thread Melvin Smith
At 10:54 AM 8/7/2004 -0400, Dan Sugalski wrote:
At 12:45 PM +0200 8/7/04, Leopold Toetsch wrote:
Sean O'Rourke [EMAIL PROTECTED] wrote:
 [EMAIL PROTECTED] (Leopold Toetsch) writes:
 The interference_graph size is n_symbols * n_symbols *
 sizeof(a_pointer). This might already be too much.
 2) There is a note in the source code that the interference graph could
 be done without the N x N graph array. Any hints welcome (Angel Faus!).
Since that is my comment I'll comment.
When I wrote the allocator, I oriented it around a basic block, and I 
figured a basic block would never have more than a few hundred symbols, so 
the N x N array was the fastest possible data structure for keeping the 
interference data.

I'm not sure how many symbols we are talking about here, it would help if 
someone threw out a number before I go and make the wrong statement.

I'm pretty sure that it would help if imcc had two modes of symbol 
analysis. Routine wide register allocation ends up with better quality 
code, but at the cost of compilation time and possible cases that can't be 
handled with the current design. If imcc sees too many symbols, it should 
try to break the sub down into basic blocks for interference analysis, but 
that also adds additional code requirements for when you put the basic 
blocks together and figure out loads, spills and reuse across basic blocks. 
There were a lot of things that I had planned for imcc that just never 
materialized. It is still very primitive when it comes to symbol analysis, 
even with all the work Leo has done.

-Melvin



Re: register allocation

2004-08-07 Thread Sean O'Rourke
[EMAIL PROTECTED] (Leopold Toetsch) writes:
 The interference_graph size is n_symbols * n_symbols * 
 sizeof(a_pointer). This might already be too much.

 2) There is a note in the source code that the interference graph could 
 be done without the N x N graph array. Any hints welcome (Angel Faus!).

It looks like the way things are used in the code, you can use an
adjacency list instead of the current adjacency matrix for the graph.
If interference is typically sparse, which I think it should be, this
is a win.  So Dan -- while you're in there with the debugging
statements, you might also want to keep a count of how many registers
interfere with each other.

/s


Re: register allocation

2004-08-07 Thread Leopold Toetsch
Sean O'Rourke [EMAIL PROTECTED] wrote:
 [EMAIL PROTECTED] (Leopold Toetsch) writes:
 The interference_graph size is n_symbols * n_symbols *
 sizeof(a_pointer). This might already be too much.

 2) There is a note in the source code that the interference graph could
 be done without the N x N graph array. Any hints welcome (Angel Faus!).

 It looks like the way things are used in the code, you can use an
 adjacency list instead of the current adjacency matrix for the graph.

Yeah. Or a bitmap. Or still better, create the interference graph per
basic block. Should be much smaller then.

 /s

leo


Re: register allocation

2004-08-07 Thread Dan Sugalski
At 2:23 PM +0200 8/6/04, Leopold Toetsch wrote:
Leopold Toetsch wrote:
4.2) Basic blocks, life span of symbols inside these blocks and 
loop detection.
Baisc block generation depends on two items:
* labels - a possible branch target
* branching opcodes
Detection of branching opcodes depends currently in a not too 
obvious way[1] on directives in ops/*.ops like:

   goto NEXT()   # no branch, just execute next instruction
   goto ADDRESS(next)  # a branch
Now all find_global and find_lex opcodes *did* use goto 
ADDRESS(next). This might be some relict of experiments with 
exceptions. These opcodes might branch elsewhere, if an exception 
had occured. But its not a branch.

I've change these opcodes to use goto NEXT() and excluded 
PARROT_JUMP_ENEXT from considering this opcode as a branch.

This fix should vastly improve PIR programs that make heavy use of 
find_lex or find_global opcodes (hello Dan).
Hello indeed! Yeah, I make a near-insane use of find_global, as a way 
to avoid huge numbers of .local declarations. This patch took care of 
the double-spill error and a few of the out of memory errors for 
too-complex source. I think I'm down to five forms that don't compile 
because they're too nasty for parrot, which is better than it's been. 
(I've been tweaking the compiler some as I go along as well to try 
and simplify things, so there have been wins there too)
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: register allocation

2004-08-07 Thread Dan Sugalski
At 9:21 AM -0700 8/6/04, Sean O'Rourke wrote:
[EMAIL PROTECTED] (Leopold Toetsch) writes:
 The interference_graph size is n_symbols * n_symbols *
 sizeof(a_pointer). This might already be too much.
 2) There is a note in the source code that the interference graph could
 be done without the N x N graph array. Any hints welcome (Angel Faus!).
It looks like the way things are used in the code, you can use an
adjacency list instead of the current adjacency matrix for the graph.
If interference is typically sparse, which I think it should be, this
is a win.  So Dan -- while you're in there with the debugging
statements, you might also want to keep a count of how many registers
interfere with each other.
I'll dig in and see if I can throw that info out. I may also add a 
counter to see how many trips through the spill loop each program 
takes. That's where most of the time's being taken, and I'm wondering 
if there's a cutoff point--if you hit N times around the loop you're 
there forever, or something.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: register allocation

2004-08-07 Thread Dan Sugalski
At 12:45 PM +0200 8/7/04, Leopold Toetsch wrote:
Sean O'Rourke [EMAIL PROTECTED] wrote:
 [EMAIL PROTECTED] (Leopold Toetsch) writes:
 The interference_graph size is n_symbols * n_symbols *
 sizeof(a_pointer). This might already be too much.
 2) There is a note in the source code that the interference graph could
 be done without the N x N graph array. Any hints welcome (Angel Faus!).

 It looks like the way things are used in the code, you can use an
 adjacency list instead of the current adjacency matrix for the graph.
Yeah. Or a bitmap. Or still better, create the interference graph per
basic block. Should be much smaller then.
I'm not 100% sure since I've not thrown in a counter yet, but I think 
that it's ultimately not a size problem, rather a case of code that 
can't *ever* be properly colored with the algorithm and 
implementation we have. (Not that reducing the size is a bad thing, 
though -- I'm all for that)
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


[PATCH] Re: register allocation

2004-08-07 Thread Steve Fink
On Aug-07, Leopold Toetsch wrote:
 Sean O'Rourke [EMAIL PROTECTED] wrote:
  [EMAIL PROTECTED] (Leopold Toetsch) writes:
  The interference_graph size is n_symbols * n_symbols *
  sizeof(a_pointer). This might already be too much.
 
  2) There is a note in the source code that the interference graph could
  be done without the N x N graph array. Any hints welcome (Angel Faus!).
 
  It looks like the way things are used in the code, you can use an
  adjacency list instead of the current adjacency matrix for the graph.
 
 Yeah. Or a bitmap.

An adjacency list would definitely be much smaller, but I'd be
concerned that it would slow down searches too much. I think the
bitmap might be worth a try just to see how much the size matters.

Since this is an n^2 issue, splitting out the four different register
types could help -- except that I'd guess that most code with
excessive register usage probably uses one type of register much more
than the rest.

Anyway, I've attached a patch that uses bitmaps instead of SymReg*'s,
which should give a factor of 32 size reduction. I've only tested it
by doing a 'make test' and verifying that the several dozen test
failures are the same before and after (I don't think things are
actually that broken; I think the make system is), but for all I know
it's not even calling the code. That's what you get when I only have a
two hour hacking window and I've never looked at the code before.

 Or still better, create the interference graph per basic block.
 Should be much smaller then.

Huh? Is register allocation done wholly within basic blocks? I thought
the point of the graph was to compute interference across basic
blocks. I guess I should go and actually read the code.
Index: imcc/reg_alloc.c
===
RCS file: /cvs/public/parrot/imcc/reg_alloc.c,v
retrieving revision 1.14
diff -u -r1.14 reg_alloc.c
--- imcc/reg_alloc.c23 Apr 2004 14:09:33 -  1.14
+++ imcc/reg_alloc.c7 Aug 2004 07:11:08 -
@@ -41,7 +41,7 @@
 static void compute_du_chain(IMC_Unit * unit);
 static void compute_one_du_chain(SymReg * r, IMC_Unit * unit);
 static int interferes(IMC_Unit *, SymReg * r0, SymReg * r1);
-static int map_colors(int x, SymReg ** graph, int colors[], int typ);
+static int map_colors(IMC_Unit *, int x, int * graph, int colors[], int typ);
 #ifdef DO_SIMPLIFY
 static int simplify (IMC_Unit *);
 #endif
@@ -58,12 +58,46 @@
 /* XXX FIXME: Globals: */
 
 static IMCStack nodeStack;
-static SymReg** interference_graph;
-/*
-static SymReg** reglist;
-*/
+static int* interference_graph;
 static int n_symbols;
 
+static int* ig_get_word(int i, int j, int N, int* graph, int* bit_ofs)
+{
+int bit = i * N + j;
+*bit_ofs = bit % sizeof(*graph);
+return graph[bit / sizeof(*graph)];
+}
+
+static void ig_set(int i, int j, int N, int* graph)
+{
+int bit_ofs;
+int* word = ig_get_word(i, j, N, graph, bit_ofs);
+*word |= (1  bit_ofs);
+}
+
+static void ig_clear(int i, int j, int N, int* graph)
+{
+int bit_ofs;
+int* word = ig_get_word(i, j, N, graph, bit_ofs);
+*word = ~(1  bit_ofs);
+}
+
+static int ig_test(int i, int j, int N, int* graph)
+{
+int bit_ofs;
+int* word = ig_get_word(i, j, N, graph, bit_ofs);
+return *word  (1  bit_ofs);
+}
+
+static int* ig_allocate(int N)
+{
+// size is N*N bits, but we want don't want to allocate a partial
+// word, so round up to the nearest multiple of sizeof(int).
+int need_bits = N * N;
+int num_words = (need_bits + sizeof(int) - 1) / sizeof(int);
+return (int*) calloc(num_words, sizeof(int));
+}
+
 /* imc_reg_alloc is the main loop of the allocation algorithm. It operates
  * on a single compilation unit at a time.
  */
@@ -446,6 +480,12 @@
 
 /* creates the interference graph between the variables.
  *
+ * data structure is a 2-d array 'interference_graph' where row/column
+ * indices represent the same index in the list of all symbols
+ * (unit-reglist) in the current compilation unit. The value in the
+ * 2-d array interference_graph[i][j] is the symbol unit-reglist[j]
+ * itself.
+ *
  * two variables interfere when they are alive at the
  * same time
  */
@@ -461,7 +501,7 @@
 /* Construct a graph N x N where N = number of symbolics.
  * This piece can be rewritten without the N x N array
  */
-interference_graph = calloc(n_symbols * n_symbols, sizeof(SymReg*));
+interference_graph = ig_allocate(n_symbols);
 if (interference_graph == NULL)
 fatal(1, build_interference_graph,Out of mem\n);
 unit-interference_graph = interference_graph;
@@ -475,8 +515,8 @@
 if (!unit-reglist[y]-first_ins)
 continue;
 if (interferes(unit, unit-reglist[x], unit-reglist[y])) {
-interference_graph[x*n_symbols+y] = unit-reglist[y];
-interference_graph[y*n_symbols+x] = unit-reglist[x];
+ig_set(x, y, n_symbols, interference_graph);

Re: register allocation (was: Spilling problems)

2004-08-07 Thread Dan Sugalski
At 9:51 AM +0200 8/6/04, Leopold Toetsch wrote:
Dan Sugalski wrote:
... extraordinarily large (60-80k lines) ...
... dying with an out-of-memory error.
How many symbols does the sub have? Please set a breakpoint at 
imcc/reg_alloc.c:468 and inspect:
 n_symbols
(or just create a debug print statement there)

The interference_graph size is n_symbols * n_symbols * 
sizeof(a_pointer). This might already be too much.
I don't even get to this spot. (Though at some point we need to do 
something about some of this code, as I see a fair number of globals, 
which makes reentrancy an issue. But that's for later)

The code, when I try with -v, gives:
 debug = 0x0
 Reading forms/order.imc
 using optimization '0' (0)
 Starting parse...
 warning:imcc:build_reglist: probably too small HASH_SIZE (29451 symbols)
 error:imcc:make_life_range: Out of mem calloc-ing 12 bytes
And then dies.
2) There is a note in the source code that the interference graph 
could be done without the N x N graph array. Any hints welcome 
(Angel Faus!).
That'd probably be a good thing... :)
3) We don't rename registers (except when it comes to spilling). So 
when a symbol starts its life somewhere at the beginning of the 
subroutine and is used down to the end, it interfers with each other 
symbol in that range. One register is blocked permanently. When 
there are around 30 such long-ranged symbols, spilling starts, which 
means that the variable is stored into and fetched from a spill 
array.
This could be avoided, if the symbol is a lexical or a global 
variable. On each fetch of this variable it could get a fresh 
register. This currently depends on the source code:

   foo = global foo # bad
   $P1234 = global foo  # good - if each fetch increments the $P...
Right, and this is good info to leave in the archive. I've long-since 
converted away from named .locals to $Pxxx temps, which made quite a 
difference -- probably made another 15-20% of my source compilable. 
Most of the temps are very short-lived, though I put in a caching 
scheme to keep from fetching things too often. (All my library subs 
have to be fetched, and I tried to cut down on the number of 
redundant fetches there)

4) Proper register allocation depends on a correct life analysis of 
the symbol, which depends basically on 2 things:

4.1) the usage of the register in the opcode. Have a look at ops/*.ops:
op add(in PMC, in PMC)
The destination (left hand side) of add is a in register, the PMC 
has to exist beforehand as well as the source operand of course.

op find_lex(out PMC, in STR)
The destination PMC register is filled with the address of the 
lexical. It gets a new pointer there, the usage is written as out. 
This means for the register allocator that the life range of this 
register starts here. Whatever was in that register is dead.

But we have opcodes that silently have such an influence on the 
registers life range. These are mainly stack pop opcodes. Some of 
these opcodes are handled manually inside imcc, but not all. This 
leads to wrong or too long life ranges and to spilling stress.

There was a TODO item (posted on the list) to mark up all opcodes in 
the source to carry along their register usage. This is still badly 
required.
Hrm. Put a notation scheme in place and I'll see about getting that done.
4.2) Basic blocks, life span of symbols inside these blocks and loop 
detection.

A basic block is a stream of opcodes that runs in one step without a 
branch and it isn't a branch target - no other opcode can jump 
inside such a basic block. If a symbol is used as in inside of 
such a block, it had to exist beforehand. Its life range is 
propagated down to the previous block. This is straight forward 
for branches but gets nasty with loops and subroutines.

The code in cfg.c is cluttered by exceptions that try to follow the 
control flow of stack-based subroutines (bsr) and CPS subroutines. 
invoke. This code needs cleaning up and I'm willing to eliminate 
support for old code that uses bsr and doesn't preserve registers. 
The main problem with invoke is that it doesn't carry a specific 
branch destination.

But anyway basic block detechtion should be ok. *But* its totally 
untested, which is bad. Any hints for testing it, patches, test 
frame work is really welcome.
Figure out what you want and I'll come up with more test cases than 
you'd ever want.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


[PATCH] Re: register allocation

2004-08-07 Thread Dan Sugalski
At 9:21 AM -0700 8/7/04, Steve Fink wrote:
On Aug-07, Leopold Toetsch wrote:
 Sean O'Rourke [EMAIL PROTECTED] wrote:
  [EMAIL PROTECTED] (Leopold Toetsch) writes:
  The interference_graph size is n_symbols * n_symbols *
  sizeof(a_pointer). This might already be too much.
 
  2) There is a note in the source code that the interference graph could
  be done without the N x N graph array. Any hints welcome (Angel Faus!).
  It looks like the way things are used in the code, you can use an
  adjacency list instead of the current adjacency matrix for the graph.
 Yeah. Or a bitmap.
An adjacency list would definitely be much smaller, but I'd be
concerned that it would slow down searches too much. I think the
bitmap might be worth a try just to see how much the size matters.
Since this is an n^2 issue, splitting out the four different register
types could help -- except that I'd guess that most code with
excessive register usage probably uses one type of register much more
than the rest.
There's definitely weighting, but when you get up to the point where 
size matters the split may still be worth it. My Evil Program here 
has:

  P temps: 26951
  S temps: 2099
  I temps: 8878
Definitely skewed, but going from n^2 (27838^2) to i^2+p^2+s^2 looks 
like it'd make quite a difference.

Anyway, I've attached a patch that uses bitmaps instead of SymReg*'s,
which should give a factor of 32 size reduction. I've only tested it
by doing a 'make test' and verifying that the several dozen test
failures are the same before and after (I don't think things are
actually that broken; I think the make system is), but for all I know
it's not even calling the code. That's what you get when I only have a
two hour hacking window and I've never looked at the code before.
I applied this patch, though it didn't change the results on my nasty 
programs. The size reduction seems worth it, though.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: register allocation

2004-08-07 Thread Leopold Toetsch
Dan Sugalski [EMAIL PROTECTED] wrote:

 I'll dig in and see if I can throw that info out. I may also add a
 counter to see how many trips through the spill loop each program
 takes.

*If* it doesn't die, you can run parrot with -v and you'll get the
spill count.

leo


Re: register allocation

2004-08-07 Thread Dan Sugalski
At 5:35 PM +0200 8/7/04, Leopold Toetsch wrote:
Dan Sugalski [EMAIL PROTECTED] wrote:
 I'll dig in and see if I can throw that info out. I may also add a
 counter to see how many trips through the spill loop each program
 takes.
*If* it doesn't die, you can run parrot with -v and you'll get the
spill count.
Alas it does, reliably. Looks like the second loop in imc_reg_alloc 
is where it goes horribly wrong.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: register allocation

2004-08-06 Thread Leopold Toetsch
Leopold Toetsch wrote:
4.2) Basic blocks, life span of symbols inside these blocks and loop 
detection.
Baisc block generation depends on two items:
* labels - a possible branch target
* branching opcodes
Detection of branching opcodes depends currently in a not too obvious 
way[1] on directives in ops/*.ops like:

   goto NEXT()   # no branch, just execute next instruction
   goto ADDRESS(next)  # a branch
Now all find_global and find_lex opcodes *did* use goto ADDRESS(next). 
This might be some relict of experiments with exceptions. These opcodes 
might branch elsewhere, if an exception had occured. But its not a branch.

I've change these opcodes to use goto NEXT() and excluded 
PARROT_JUMP_ENEXT from considering this opcode as a branch.

This fix should vastly improve PIR programs that make heavy use of 
find_lex or find_global opcodes (hello Dan).

leo
[1] see lib/Parrot/OpsFile.pm line 525ff


Re: Register allocation bug in IMCC? Or misunderstanding on my part?

2003-06-02 Thread Leopold Toetsch
Clinton A. Pierce [EMAIL PROTECTED] wrote:
 The following code:

  push_integer() not implemented in class 'PerlHash'

 This is, as far as I can tell, because the same register is used by IMCC
 for both the READDATA and RESTOREINFO locals, thus by the time that the sub
 _data gets around to being run, READDATA has become a PerlHash (imcc -t):

PC=0; OP=821 (new_p_ic); ARGS=(P0, 14)
PC=3; OP=821 (new_p_ic); ARGS=(P0, 15)

Yep. Good analysis.

You could tell it a bug. Imcc should follow the code path into the
subroutine. But he problem is: the subroutine isn't known, when _main is
compiled. So IMCC only sees one usage of the Array and one usage of the
Hash amd assigns both to the same register P0.

If you want really use such constructs, you can't put them in different
compilation units, because they are basically one unit.

Much better is to implement some kind of calling conventions.

As soon as you pass the READDATA array to the subroutine, it will have
its own register and the problem is gone.

leo


Re: Register allocation bug in IMCC? Or misunderstanding on my part?

2003-06-02 Thread Clinton A. Pierce
At 06:57 PM 6/1/2003 +0200, Leopold Toetsch wrote:
Clinton A. Pierce [EMAIL PROTECTED] wrote:
 The following code:
  push_integer() not implemented in class 'PerlHash'

 This is, as far as I can tell, because the same register is used by IMCC
 for both the READDATA and RESTOREINFO locals, thus by the time that the sub
 _data gets around to being run, READDATA has become a PerlHash (imcc -t):
PC=0; OP=821 (new_p_ic); ARGS=(P0, 14)
PC=3; OP=821 (new_p_ic); ARGS=(P0, 15)
Yep. Good analysis.

You could tell it a bug. Imcc should follow the code path into the
subroutine. But he problem is: the subroutine isn't known, when _main is
compiled. So IMCC only sees one usage of the Array and one usage of the
Hash amd assigns both to the same register P0.
Makes sense.


If you want really use such constructs, you can't put them in different
compilation units, because they are basically one unit.
So a BASIC IMCC program, with all of the builtin functions it needs, plus 
the actual user program itself as one compilation unit?  All for the sake 
of a few needed globals?  Sounds ghastly.

Much better is to implement some kind of calling conventions.
Except that these represent true global data structures that the entire 
BASIC machine needs.  They're not really things to be passed around from 
place to place and the thought of having a Px register to hold the state 
of BASIC and passing it (potentially) everywhere seems kinda silly for a 
language that isn't really threaded in any incarnation and is ever only 
going to need one state.

I've *got* good, sound calling conventions now but there are things (sofar, 
only half a dozen*) that exist in an outermost scope that need initialization.

I could turn the whole thing upside-down (subs first, main last) except 
that I'd need a token outer main() to call the initialization routine.  And 
from a quick test IMCC can tell that the execution path jumped down, and 
optimizes those registers away anyway.  Nevermind.

Further suggestions, given better info?

* Examples of things that are really global in a BASIC machine: current 
random number seed, cursor column position, console current character 
attributes, last value picked up from a READ/DATA combo (line  position), 
structure definition table, and a GOSUB call-stack lookback specifically 
for RETURN x.




Re: Register allocation bug in IMCC? Or misunderstanding on my part?

2003-06-02 Thread Luke Palmer
 If you want really use such constructs, you can't put them in different
 compilation units, because they are basically one unit.
 
 So a BASIC IMCC program, with all of the builtin functions it needs, plus 
 the actual user program itself as one compilation unit?  All for the sake 
 of a few needed globals?  Sounds ghastly.

No.  Global things shouldn't be in registers in the first place,
because you're using up your computation resources for things that
have a very long lifetime.

 Much better is to implement some kind of calling conventions.
 
 Except that these represent true global data structures that the entire 
 BASIC machine needs.  They're not really things to be passed around from 
 place to place and the thought of having a Px register to hold the state 
 of BASIC and passing it (potentially) everywhere seems kinda silly for a 
 language that isn't really threaded in any incarnation and is ever only 
 going to need one state.

If you use them often... then maybe one or two need a real register,
but I'd still be weary of doing that.  Use find_global and its
friends. 

 I've *got* good, sound calling conventions now but there are things (sofar, 
 only half a dozen*) that exist in an outermost scope that need initialization.
 
 I could turn the whole thing upside-down (subs first, main last) except 
 that I'd need a token outer main() to call the initialization routine.  And 
 from a quick test IMCC can tell that the execution path jumped down, and 
 optimizes those registers away anyway.  Nevermind.
 
 Further suggestions, given better info?
 
 
 * Examples of things that are really global in a BASIC machine: current 
 random number seed, cursor column position, console current character 
 attributes, last value picked up from a READ/DATA combo (line  position), 
 structure definition table, and a GOSUB call-stack lookback specifically 
 for RETURN x.

Yeah, most of those seem like they can be stored in the global name
table instead of a register.  Load them in when you need them.  Except
the latter, which seems like you should be using the call stack, if
I'm not misunderstanding.

Luke


Re: Register allocation bug in IMCC? Or misunderstanding on my part?

2003-06-02 Thread Uri Guttman
 CAP == Clinton A Pierce [EMAIL PROTECTED] writes:

  CAP Yes, my friend.  Continue to use quotes around the word language
  CAP when describing this mess.

   Without RETURN x, just plain old GOSUB/RETURN I could let PIR/PASM
   handle all by itself with no supervision.
   
   Or you handle normal subs like that and generate special code for
   return x?

  CAP I could, except that it's impossible to tell where exactly the exit
  CAP point of a subroutine (for GOSUB) is in a BASIC program.  In fact,
  CAP all GOSUB destinations could use an unconditional branch to all use
  CAP a common RETURN point.  You just return when you stumble across a
  CAP return.  Further analysis of where the subroutines are may be
  CAP impossible with mechanical (or human) analysis at compile time:

  CAP  5 rem Valid but seriously spaghettified BASIC example.
  CAP  7 rem Good luck analyzing this at compile time for entry
  CAP  9 rem  and exit points for the subroutines.
  CAP  10 print Hello, world!
  CAP  20 for i = 1 to 3
  CAP  30 gosub 70
  CAP  40 next
  CAP  50 end
  CAP  70 gosub mysub
  CAP  80 gosub 110
  CAP  90 return
  CAP  100 return 30
  CAP  110 rem alternate entry point
  CAP  mysub:
  CAP  print Hello, galaxy!
  CAP  on int(rnd(2)+1) goto 90, 100

okay, i am going to jump in and try to drown here. :)

why don't you manage your own basic call stack in an array or PerlArray
or something? trying to map that mess of call/return poo onto a proper
compiler with register allocation is going to lose us the services of
leo while he recuperates at the hospital for homicidal BASIC coders.

you will just have to bite the bullet and emulate your own virtual
engine over parrot and not try to map BASIC directly to it. so code a VM
with BASIC flow control. you can probably do something odd like make
each BASIC statement into a parrot sub. then just compile your basic
into a simple flow control IL that just calls parrot subs. the subs all
have access to your need globals (with the obvious
find_global/set_global :) a primary global now is the BASIC VM PC.

so your code gen is much easier too. each statement is codegenned and
you keep their label or sequence numbers in your flow tree and they map
to a parrot sub address (that you generate as you codegen). later to
execute you just scan your own flow tree (that is the basic interpreter
engine), call each parrot sub (BASIC statement)in turn and let parrot
to the real work in each statement. BASIC calls and returns are handled
with munging the global BASIC PC and the global stack.

feel free to tell me this is total cocky-pop. :)

uri

-- 
Uri Guttman  --  [EMAIL PROTECTED]   http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs    http://jobs.perl.org