Re: [m5-dev] X86 performance

Gabe Black Sat, 23 Oct 2010 01:04:01 -0700

 On 10/23/10 00:56, Gabe Black wrote:
>  On 10/22/10 21:23, nathan binkert wrote:
>>> What do you mean by "given a certain prefix"? X86's encoding is fairly
>>> amorphous, and it's not always possible to tell early on how big an
>>> instruction will be. It's possible in general of course, so it depends
>>> on what you actually mean there. Would chasing through linked lists like
>>> that slow things down a lot with memory dependencies, caching, etc.?
>> Let me be precise.  Given a valid instruction array inst1 with length
>> i, is it ever possible for there to be a valid instruction array inst2
>> of length j where j != i and inst1[k] == inst2[k] where 1 <= k <=
>> min(i,j).
> Yes, definitely. X86 defines a number of prefixes that can be applied in
> (almost) any combination, quantity and/or order at the start of an
> instruction. You can, unless I'm misremembering, specify any of the
> legacy prefixes 13 times in a row then another once, and then a one byte
> opcode. The same instruction could be encoded by exchanging the
> prefixes, moving the unique one around, etc. The possibilities aren't
> endless, but they're certainly numerous.


Oh, wait, inst1[k] == inst2[k]. I read that as !=. See the paragraph
below then.

>> A concrete example.  Is it ever possible for an instruction that's 8
>> bytes long to have the same 8 bytes as the first 8 bytes of an
>> instruction whose length is greater than 8.
> In this case mostly no, unless you allow the mode to change. The inc/dec
> instructions in legacy/compat modes are repurposed as the REX (64 bit
> stuff) prefixes in 64 bit mode, so if you use those in one case it would
> be the opcode/end of the instruction, and in the other it would be
> followed by a bunch of stuff, at least an opcode byte.
>
>> As for the pointers (it's not a list, it's a tree).  The maximum depth
>> of the tree is equal to the maximum length of and instruction in bytes
>> and the average depth is the average instruction length.
> It is a tree and not a list, but you could consider it a list with
> harder to guess next pointers.
>
>>> We
>>> would have to find some way of keeping the bytes around at least to
>>> reprocess with the predecoder on a miss, and it might be easiest to do
>>> that byte by byte since the alignment is arbitrary. The way I hash the
>>> ExtMachInst right now is just to pack all the fields together into same
>>> lengthed chunks, xor those, and then hash the result, and that may just
>>> not be a very good hash function. By putting things into a tree we might
>>> avoid the entire issue of wonky roll-your-own hash functions. All in all
>>> I'm not sure the tree idea would work, but it's definitely at least good
>>> to consider.
>> What don't you think would work about it?  One benefit of it is that
>> you don't need to know how long an instruction is.
>>
>>   Nate
>>
> Well, it does break your assumptions above. I don't know exactly why
> they were necessary to make so I can't say why that affects things, but
> I'll assume it does. Do you think the pointer chasing would be an issue?
> The chain could be pretty long and probably pretty hard to predict. I
> think functionally it'd probably work, but other issues might derail it.
> One way or the other it wouldn't be that hard to implement, so I might
> just give it a shot when I eventually start working on this and see what
> happens.
>
> Gabe
> _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev

_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Re: [m5-dev] X86 performance

Reply via email to