Re: [m5-dev] X86 performance

Gabe Black Sat, 23 Oct 2010 00:56:03 -0700

 On 10/22/10 21:23, nathan binkert wrote:
>> What do you mean by "given a certain prefix"? X86's encoding is fairly
>> amorphous, and it's not always possible to tell early on how big an
>> instruction will be. It's possible in general of course, so it depends
>> on what you actually mean there. Would chasing through linked lists like
>> that slow things down a lot with memory dependencies, caching, etc.?
> Let me be precise.  Given a valid instruction array inst1 with length
> i, is it ever possible for there to be a valid instruction array inst2
> of length j where j != i and inst1[k] == inst2[k] where 1 <= k <=
> min(i,j).


Yes, definitely. X86 defines a number of prefixes that can be applied in
(almost) any combination, quantity and/or order at the start of an
instruction. You can, unless I'm misremembering, specify any of the
legacy prefixes 13 times in a row then another once, and then a one byte
opcode. The same instruction could be encoded by exchanging the
prefixes, moving the unique one around, etc. The possibilities aren't
endless, but they're certainly numerous.

> A concrete example.  Is it ever possible for an instruction that's 8
> bytes long to have the same 8 bytes as the first 8 bytes of an
> instruction whose length is greater than 8.

In this case mostly no, unless you allow the mode to change. The inc/dec
instructions in legacy/compat modes are repurposed as the REX (64 bit
stuff) prefixes in 64 bit mode, so if you use those in one case it would
be the opcode/end of the instruction, and in the other it would be
followed by a bunch of stuff, at least an opcode byte.

> As for the pointers (it's not a list, it's a tree).  The maximum depth
> of the tree is equal to the maximum length of and instruction in bytes
> and the average depth is the average instruction length.

It is a tree and not a list, but you could consider it a list with
harder to guess next pointers.

>> We
>> would have to find some way of keeping the bytes around at least to
>> reprocess with the predecoder on a miss, and it might be easiest to do
>> that byte by byte since the alignment is arbitrary. The way I hash the
>> ExtMachInst right now is just to pack all the fields together into same
>> lengthed chunks, xor those, and then hash the result, and that may just
>> not be a very good hash function. By putting things into a tree we might
>> avoid the entire issue of wonky roll-your-own hash functions. All in all
>> I'm not sure the tree idea would work, but it's definitely at least good
>> to consider.
> What don't you think would work about it?  One benefit of it is that
> you don't need to know how long an instruction is.
>
>   Nate
>

Well, it does break your assumptions above. I don't know exactly why
they were necessary to make so I can't say why that affects things, but
I'll assume it does. Do you think the pointer chasing would be an issue?
The chain could be pretty long and probably pretty hard to predict. I
think functionally it'd probably work, but other issues might derail it.
One way or the other it wouldn't be that hard to implement, so I might
just give it a shot when I eventually start working on this and see what
happens.

Gabe
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Re: [m5-dev] X86 performance

Reply via email to