On 04/23/12 14:45, Nilay Vaish wrote:
> On Mon, 23 Apr 2012, Gabe Black wrote:
>
>> On 04/23/12 07:50, Steve Reinhardt wrote:
>>>
>>>
>>> On Sun, Apr 22, 2012 at 10:32 PM, Gabe Black <[email protected]
>>> <mailto:[email protected]>> wrote:
>>>
>>>     On 04/22/12 20:42, Steve Reinhardt wrote:
>>>>
>>>>             I'm jumping in partially informed, but can we just have
>>>>             two functions, like:
>>>>
>>>>               zaps = setZaps(<something>);
>>>>
>>>>             and
>>>>
>>>>               zaps = modifyZaps(zaps, <something>);
>>>>
>>>>             and then let the isa parser do its stuff naturally?
>>>>
>>>>             Steve
>>>>
>>>>
>>>>         I don't think that is possible. This code will appear in the
>>>>         .isa file. In the .isa file, we cannot decide which version
>>>>         to use as the CC bits to be written vary with the context in
>>>>         which the microop is used. So, we need a run time condition
>>>>         that figures out tries to evaluate if the register needs to
>>>>         be read.
>>>>
>>>>
>>>>     Sorry for being way behind on this, but I'm curious just how many
>>>>     microops there are that have different impacts on the flags
>>>>     depending on their context, and how many different contexts there
>>>>     are.  I got the impression before that there would be this huge
>>>>     explosion of microops if we actually had a different ADD
>>>>     micro-op  (for example) for each set of bits that it could
>>>>     possibly write.  However, looking at Appendix E of Vol 3 of the
>>>>     AMD ISA manual, it looks like the set of bits written by each
>>>>     macro-instruction (at least) is pretty well defined.  I can
>>>>     believe that it's also valuable to have an ADD micro-op that
>>>>     doesn't affect flags for use in microcode sequences.  But is
>>>>     there a 3rd version of ADD we need that modifies some but not all
>>>>     the flags that the ADD macroinstruction does?
>>>>
>>>>     Basically if (hypothetically exaggerating) 80% of the macro-ops
>>>>     can modify five different combinations of flags depending on
>>>>     context, then this complex mechanism makes sense.  On the other
>>>>     hand, if there are a small number of microops that need two
>>>>     versions (one that modifies a certain set of flags and one that
>>>>     doesn't), and maybe an even smaller number that legitimately
>>>>     decide which flags to look at at runtime, then this is starting
>>>>     to feel like overkill.  I'm sure the truth is somewhere in the
>>>>     middle, but I just don't understand the code well enough to know
>>>>     which extreme it's closer to... and if it's close to the former,
>>>>     I'd like to understand why.
>>>>
>>>>     Steve
>>>>
>>>
>>>     Being well defined and being consistent are not the same things. I
>>>     looked through the instruction reference one instruction at a time
>>>     a year or two ago tabulating what flags they set, and it was not
>>>     apparent that there were some small number of combinations. You
>>>     are welcome to repeat the process, but I'll pass.
>>>
>>>
>>> Are you saying that if I looked closely, I would get a different
>>> result than the table that's already provided in Appendix E of Vol. 3?
>>
>>
>> Well that would have made life easier back then. That's basically the
>> information I was gathering. There are trends, but I don't think it's
>> consistent. That table also looks pretty short. I'm not sure it has all
>> the instructions in it, although I don't have time to look through it
>> very carefully right now.
>>
>>
>>>
>>>
>>>     Also, we're not setting flags at the macroop level, we're setting
>>>     them at the microop level.
>>>
>>>
>>> Yea, I understand that, and already mentioned that above.  To quote:
>>> "I can believe that it's also valuable to have an ADD micro-op that
>>> doesn't affect flags for use in microcode sequences.  But is there a
>>> 3rd version of ADD we need that modifies some but not all the flags
>>> that the ADD macroinstruction does?".
>>
>>
>> I'd be pretty surprised if the answer wasn't yes, although I don't have
>> time right now to dig through the microcode to find an example. You
>> should be able to grep and find all the adds fairly easily. You might
>> want to just peruse the microcode anyway since there are probably other
>> microops which end up being used differently than add but which would
>> have to follow the same scheme.
>>
>>
>>>
>>>
>>>     Besides the fact that I don't think such a small set exists, I
>>>     don't want to have to live with that small set for forever, or
>>>     redo all the existing macroops so they use the right version of
>>>     the microops.
>>>
>>>
>>> I'm not arguing for or against anything right now, I'm just trying to
>>> understand the situation a little better.  So far all the design
>>> discussions have implied that there are just crazy scads of microops
>>> that could decide to read or write any arbitrary subset of flags at
>>> any time, and that seems a little suspicious to me.  I can believe
>>> that there is enough diversity to make all this mechanism worthwhile,
>>> I'd just like to get more specific examples and a better handle on the
>>> scope of the problem.
>>
>> Right. Also keep in mind that we haven't implemented everything with x86
>> yet, so what we've used now isn't necessarily representative of
>> everything we'll need in the future.
>>
>> Gabe
>>
>
> It seems like that this getting back to the original solution that I
> proposed. The idea in that solution was that we will have three different
> types of microops for the same microop -
>
>  (a) one that does not write CC at all ==> no need to read CC
>
>  (b) one that partially updates CC ==> need to read CC to perform the
> merge
>
>  (c) one that completely updates CC ==> no need to read CC if no bits
> are being read explicitly.
>
> We already have different microop classes for case (a) and (b), and
> (b) also fulfills the role when case (c) occurs. We can differentiate
> in the .isa file that the microop in the current context writes to all
> the CC bits and reads none, therefore should not read the CC register.
> These microops will then get mapped to case (c). There may be several
> issues involved here -
>
> 1. Is it possible to generate different microop classes for (b) and
> (c)? It seems we would have to template flag_code, so that for (c), we
> can replace the first read of the CC register with 0.
>
> 2. Suppose isa_parser gets as input the following statements --
>     CC = 0;
>     CC = CC | Carry;
> Will the isa_parser recognize that there is no need to read CC register?
>
> 3. What happens when we split the CC register? It seems there will be
> five parts of the register ([ZAPS], [O], [C], [rest], [ECF,EZF]). If
> we have three cases for each of these five registers, that means we
> will have 243 different combinations in all. Can we some how figure
> out the microop types that need to be generated?
>
> -- 
> Nilay
>
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev

The point of my email is that we're *not* ending up back there, and that
we *do* still need to do things differently. I'm confident the number of
microops we'll end up will be prohibitive, and going the other way and
not letting it get that way will make things restrictive writing new
microcode. We get pinched between those two options and don't have a
good place to be in the middle. I may be wrong, but that's definitely
what I expect and I did write almost all of the microcode (granted a
while ago) so I'm decently familiar with it. Also, [ECF] and [EZF]
should be separate. I'm 90% sure I remember places where those are set
by different microops and need to exist independently.

Gabe
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to