On Fri, 2003-06-06 at 15:12, Dan Sugalski wrote: > Our options, as I see them, are: > > 1) Make the I registers 64 bits > 2) Make some way to gang together I registers to make 64 bit things > 3) Have I registers switchable between 32 and 64 bit somehow > 4) Have separate 32 and 64 bit I registers > 5) Do guaranteed 64 bit math in PMCs > > The first is just out. It's an unreasonable slowdown on 32 bit (and > some 64 bit) machines, for no overall win. The majority of integers > will be smallish, and most of even the 32 bit range will be wasted.
I don't necessarily agree that this option is gone. IREGs are basically used for one of two things. To do non-PMC integer math, and to pass things to and from Parrot's guts. (And then you're talking a store and a load. I think passing them throughout Parrot is where the problem is.) So that would leave doing non-PMC integer math. That just doesn't sound like a whole lot. (But then again, I'm assuming that most math will be PMC-based, in order to handle int->num->str->big type conversions. If we want to minimize PMC-math, then perhaps this is a bigger deal.) You know, there was a day when we'd just write some code and benchmark it to see *how* much slower it is.... No, no, no. Don't get up. I'll do it. :-) Gluing together most of the IREG-based arithmetics pasm files, removing the prints, and wrapping an iterator around it. Athlon 1 GHz, Linux 2.4.20. Identical Parrot configurations, save the size of INTVALs. long long INTVALs: 4.98u @ 54% long INTVALS : 4.31u @ 54% Difference, .67u @ 54%, or about 15%. (With the JIT, long long INTVALs were *much* faster, but only because they cheated and dumped core.) So what percentage of a program is using the IREGs for math? 10%? 5%? 2%? That's a 1.5% to .3% overall slow down. Keep those numbers in mind. > > I don't like option 2, since it means that we speed-penalize 64 bit > systems, which seems foolish. See below. > > Option 3 wastes half the L1 cache space that I registers takes up. > Fluffy caches--ick. Plus validating the bytecode will be... > interesting, even at runtime. See below. > > 4 isn't that bad. Not great, as it's more registers, and something of > a waste on 64 bit systems, but... See below. > > #5 is something of a cop-out, but I'm not quite sure how much. See below. > > From what I can think, we need guaranteed 64 bit integers for file > offsets, JVM & .NET support, and some fairly special-purpose math > stuff. I'd tend to discount the special-purpose math stuff--that's > not our target. JVM and .NET don't do much 64 bit stuff, but they do > some. The file offset parts are in some ways the least of it, though > we do need to have some internal support for 64 bits to get integer > values out of PMCs without loss. See below. Oh, wait. This *is* below. Okay, see here. Let's back up a step. When it comes to integers, there are two types - no pun intended - of languages. Those that care, and those that don't. Sized integer math has two properties to it, which are intertwined: dynamic range and mathematical semantics. (Dynamic range states that 8 bits can hold 8 bits worth of stuff, whether it's interpreted as signed, unsigned, or normalized (like exponents in IEEE floating point representations); as either numbers or bits. Mathematical semantics are what make (int32_t)((int8_t)0x66 + (int8_t)0x66) == (int32_t)0xffffffcc rather than 0x000000cc.) Although there will be cases where a typed language doesn't really care how large the range or the nature of the mathematical semantics for a given type, there will be times that it does. So we've either got to provide, somehow, all types, or provide one type that emulates the semantics of all types. Untyped languages simply don't care what they get underneath, as long as they work. Except, of course, when they're trying to tie into a typed language. (Pass a 16-bit int from Java to Perl, do some stuff, and pass it back, for instance.) Hardware handles this with different ops, of course, although compilers cheat where they can (or have to). For Parrot, however, that means multiplying the number of ops by 4 or 5. (Multiple IREG ops would still be a common multiple and not an exponent, as you'd promote both integers to the same size.) I think we're op-heavy, already, and Parrot would then have to track integer sizes. (Although for untyped languages, that'd be easy, as they'd all be one size.) Plus, you'd have to map those onto the common set of IREGs. Or create 4 or 5 more. (And then decide how you handle things like integer promotion.) Of course, you could continue to handle this with one op, albeit smart enough to handle the semantics of whatever size math you're doing. That way, you'd only be doing the slow, 64-bit math when you absolutely needed to. The problem is, of course, those numbers up top I told you to remember. Writing that smart op is going to cost you far more than a mere 1.5%. You've slowed everything down to speed up one case, which, by the way, didn't speed up because you're jumping through such hoops to avoid it. Even the JIT may not handle this efficiently. Certainly, at everything less than native size, it's normally a trivial tweak or two: .L2: movb $4, -1(%ebp) movb $10, -2(%ebp) movb -2(%ebp), %al addb -1(%ebp), %al movb %al, -3(%ebp) .L3: movl $1434, -8(%ebp) movl $345344, -12(%ebp) movl -12(%ebp), %eax addl -8(%ebp), %eax movl %eax, -16(%ebp) But once you have to loosen your belt, your code blows up to: .L4: movl $24234234, -24(%ebp) movl $0, -20(%ebp) movl $42342342, -32(%ebp) movl $0, -28(%ebp) movl -32(%ebp), %eax movl -28(%ebp), %edx addl -24(%ebp), %eax adcl -20(%ebp), %edx movl %eax, -40(%ebp) movl %edx, -36(%ebp) So that brings us back to one big, flat space. Either the ideal system width, which will run faster, or the largest width possible. If we choose the ideal system width, it may be too small to support typed languages, or the occasional system metric which requires it. (Like 64-bit file offsets, which Dan ever-so-kindly reminded me of.) If we choose the largest width possible, we slow things down, but we can mostly support everything. I say mostly, because there's no telling how typed languages will feel about being run atop a unitype system, regardless of the size of that one type. Of course, the languages should feel free to either create their own PMCs that map to those types, or create the ops that their compiler would generate to handle the mathematical semantics of those types within Parrot's unitype: inline op add_8 (out INT, in INT, in INT) { $1 = (INTVAL)((int8_t)$2 + (int8_t)$3); goto NEXT(); } inline op add_16 (out INT, in INT, in INT) { $1 = (INTVAL)((int16_t)$2 + (int16_t)$3); goto NEXT(); } That puts the impetus on each language to track its own types, but all types map correctly in, out, and between languages. (And at the cost of only one more instruction.) And it only affects those in need. But then we're back to where we started, with these big INTVALs running amok needlessly throughout Parrot. We certainly want to minimize their usage, and, practically speaking, their usage is language level math. After all, C is a typed language, and if we're going to interface with C (or, more accurately, Parrot's internals and the underlying system, which are written in C), then we can do it like above. Luckily, for us, Parrot's internals (at any given time) are pretty well fixed, which means if an op - say, print - needs to pass a file number, that number will always be an int. |---- LANGUAGE LAYER ----|----- INTERPRETER LAYER -----| program <-> registers <-> ops <-> parrot internals <-> system |---- ARBITRARY SIZES ---|------- SYSTEM SIZES --------| The boundary between op code and parrot internals is also the boundary between where arbitrary numbers are needed for language support, and useless for the system. So let's convert when we cross that boundary. 1) We gain a performance boost in Parrot's internals, in both faster and smaller code. 2) We suffer a slight penalty in IREG math. (But we don't suffer a larger penalty trying to avoid it.) 3) We keep Parrot simple, and, well.... KISS. 4) We push the complexity and the decisions of integer types to the specific languages to implement as they see fit - PMC, op, or don't really care - while providing a common type to convert through, and without tying them to one all-encompassing model. 5) The coding rules are simple: ops are built on INTVALs, Parrot internals are not.[1] How far away are we? For Parrot internals, it's largely a substitute job. Find the right type for the job, and fix the code. The ops need explicit casting added. The biggest problem is probably the JIT, because mandating 64-bit support means a long long on x86, which doesn't JIT right now. But, overall, that's not that far. Thoughts? [1] Of course, you know there *has* to be an exception. Currently, Parrot internally provides some direct support routines explicitly for INTVALs, namely stringification as part of the various *printf routines. I consider those type of routines more of an "op support library" than Parrot internals. (Functionally, although certainly not lexically, as it currently stands.) -- Bryan C. Warnock bwarnock@(gtemail.net|raba.com)