[slightly rearranged]

Dan Sugalski wrote:
[snip]
> 2) Make some way to gang together I registers to make 64 bit things
[snip]
> I don't like option 2, since it means that we speed-penalize 64 bit
> systems, which seems foolish.

I think that it could be done in a way that doesn't significantly
peanalize 64 bit machines.

First, define a pair of ops:

   op canon_to_native_L(out int, out int, in int, in int)

On a 32 bit machine, this would be implemented as:

   {   int tmp1 = $3, tmp2 = $4;
       $1 = tmp1; $2 = tmp2;
       goto NEXT();
   }

On a 64 bit machine, this would be implemented as:

   {   $1 = ($3 << 32) | ($4 & 0x00000000FFFFFFFF);
       goto NEXT();
   }

I expect that for most of the uses of this, we'd be modifying registers
"in-place" from pairs of 32-bit values to single 64-bit values.

On those platforms where no change needs to be done to the data (i.e.,
the "int" type, and thus our "I" type, is only 32 bits), then when the
interpreter loads the bytecode, it could treat it as a no-op, and
discard it.  Thus, there's no cost to it on those platforms.

On a 64 bit platform, well, we're wasting the memory that could be in
$2, but at least there's no *speed* penalty here.

   op native_to_canon_L(out int, out int, in int, in int)

This, of course, would do the opposite of the the other op, changing the
"native" 64-bit integer into two 32-bit values.

The mathematical opcodes would, for long arithmetic, each use a pair of
registers for each input.  They'd only work on "nativeized" long values.

And of course, on a machine with 64-bit ints, the second register of
each pair (c|w)ould be ignored.

> 3) Have I registers switchable between 32 and 64 bit somehow

> Option 3 wastes half the L1 cache space that I registers takes up.
> Fluffy caches--ick. Plus validating the bytecode will be...
> interesting, even at runtime.

Consider the following optomization for my suggestion above:

On those machines where "int" is 32 bits, but a 64 bit "long long"
exists, our interpreter, upon loading the bytecode, could detect when
pairs of register arguments to 64-bit math ops are adjacent and aligned,
and could replace that op with an alternate equvilant one, which is
optomized by casting the (interpreter->ctx.int_reg.registers) into a
(long long*) and accessing the two registers as one register.

Note that these alternate versions of long-int opcodes would never
appear in portable parrot assembly files -- the interpreter would
replace register-pair math ops with "long long" ops when loading the
bytecode, and only do so when it's save/valid/correct to do so.  Thus,
there's no extra work when validating the bytecode.

> 4 isn't that bad. Not great, as it's more registers, and something of
> a waste on 64 bit systems, but...

When you say "64-bit system" here, do you mean ones where "int" is 64
bits, or where "int" is 32 bits, but there just happens to exist a 64
bit integer type as well?

Certainly for the latter, it's not a waste.

And for the former... well, we'd be wasting half of the memory that's in
our "32-bit" registers (since we'd still use 64 bits of storage for each
of our registers, even though we're "using" only 32 bits of it), but
there's no speed penalty, and unless there's overflow of the 32 LSB,
there's little harm in using a 64 bit integer as if it were a 32 bit
integer.

The big waste, of course, is that if code doesn't *use* them, then it
could be wasteful/costly to save them.

-- 
$a=24;split//,240513;s/\B/ => /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print "[EMAIL PROTECTED]
]\n";((6<=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))&&redo;}

Reply via email to