On Mon, 23 Apr 2012, Gabe Black wrote:

On 04/23/12 07:50, Steve Reinhardt wrote:


On Sun, Apr 22, 2012 at 10:32 PM, Gabe Black <[email protected]
<mailto:[email protected]>> wrote:

    On 04/22/12 20:42, Steve Reinhardt wrote:

            I'm jumping in partially informed, but can we just have
            two functions, like:

              zaps = setZaps(<something>);

            and

              zaps = modifyZaps(zaps, <something>);

            and then let the isa parser do its stuff naturally?

            Steve


        I don't think that is possible. This code will appear in the
        .isa file. In the .isa file, we cannot decide which version
        to use as the CC bits to be written vary with the context in
        which the microop is used. So, we need a run time condition
        that figures out tries to evaluate if the register needs to
        be read.


    Sorry for being way behind on this, but I'm curious just how many
    microops there are that have different impacts on the flags
    depending on their context, and how many different contexts there
    are.  I got the impression before that there would be this huge
    explosion of microops if we actually had a different ADD
    micro-op  (for example) for each set of bits that it could
    possibly write.  However, looking at Appendix E of Vol 3 of the
    AMD ISA manual, it looks like the set of bits written by each
    macro-instruction (at least) is pretty well defined.  I can
    believe that it's also valuable to have an ADD micro-op that
    doesn't affect flags for use in microcode sequences.  But is
    there a 3rd version of ADD we need that modifies some but not all
    the flags that the ADD macroinstruction does?

    Basically if (hypothetically exaggerating) 80% of the macro-ops
    can modify five different combinations of flags depending on
    context, then this complex mechanism makes sense.  On the other
    hand, if there are a small number of microops that need two
    versions (one that modifies a certain set of flags and one that
    doesn't), and maybe an even smaller number that legitimately
    decide which flags to look at at runtime, then this is starting
    to feel like overkill.  I'm sure the truth is somewhere in the
    middle, but I just don't understand the code well enough to know
    which extreme it's closer to... and if it's close to the former,
    I'd like to understand why.

    Steve


    Being well defined and being consistent are not the same things. I
    looked through the instruction reference one instruction at a time
    a year or two ago tabulating what flags they set, and it was not
    apparent that there were some small number of combinations. You
    are welcome to repeat the process, but I'll pass.


Are you saying that if I looked closely, I would get a different
result than the table that's already provided in Appendix E of Vol. 3?


Well that would have made life easier back then. That's basically the
information I was gathering. There are trends, but I don't think it's
consistent. That table also looks pretty short. I'm not sure it has all
the instructions in it, although I don't have time to look through it
very carefully right now.




    Also, we're not setting flags at the macroop level, we're setting
    them at the microop level.


Yea, I understand that, and already mentioned that above.  To quote:
"I can believe that it's also valuable to have an ADD micro-op that
doesn't affect flags for use in microcode sequences.  But is there a
3rd version of ADD we need that modifies some but not all the flags
that the ADD macroinstruction does?".


I'd be pretty surprised if the answer wasn't yes, although I don't have
time right now to dig through the microcode to find an example. You
should be able to grep and find all the adds fairly easily. You might
want to just peruse the microcode anyway since there are probably other
microops which end up being used differently than add but which would
have to follow the same scheme.




    Besides the fact that I don't think such a small set exists, I
    don't want to have to live with that small set for forever, or
    redo all the existing macroops so they use the right version of
    the microops.


I'm not arguing for or against anything right now, I'm just trying to
understand the situation a little better.  So far all the design
discussions have implied that there are just crazy scads of microops
that could decide to read or write any arbitrary subset of flags at
any time, and that seems a little suspicious to me.  I can believe
that there is enough diversity to make all this mechanism worthwhile,
I'd just like to get more specific examples and a better handle on the
scope of the problem.

Right. Also keep in mind that we haven't implemented everything with x86
yet, so what we've used now isn't necessarily representative of
everything we'll need in the future.

Gabe


It seems like that this getting back to the original solution that I
proposed. The idea in that solution was that we will have three different
types of microops for the same microop -

 (a) one that does not write CC at all ==> no need to read CC

 (b) one that partially updates CC ==> need to read CC to perform the
merge

(c) one that completely updates CC ==> no need to read CC if no bits are being read explicitly.

We already have different microop classes for case (a) and (b), and (b) also fulfills the role when case (c) occurs. We can differentiate in the .isa file that the microop in the current context writes to all the CC bits and reads none, therefore should not read the CC register. These microops will then get mapped to case (c). There may be several issues involved here -

1. Is it possible to generate different microop classes for (b) and (c)? It seems we would have to template flag_code, so that for (c), we can replace the first read of the CC register with 0.

2. Suppose isa_parser gets as input the following statements --
        CC = 0;
        CC = CC | Carry;
Will the isa_parser recognize that there is no need to read CC register?

3. What happens when we split the CC register? It seems there will be five parts of the register ([ZAPS], [O], [C], [rest], [ECF,EZF]). If we have three cases for each of these five registers, that means we will have 243 different combinations in all. Can we some how figure out the microop types that need to be generated?

--
Nilay

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to