Yeah, I'll see if I get time to do a more full solution later this week.  I
also realized this current patch would break FS mode, since the frontend
signals instruction fetch page faults by creating a nop with a fault
attached (this patch would just discard that nop).  So, checking for a
fault would be required before discarding.

Discarding unconditional jumps also works.  Did a quick mod where I
discarded them if the following was true "if (isUncondCtrl() &&
isDirectCtrl() && !inst->writesRegs())".  Where writesRegs was returned
true if the instruction wrote something other than the pc or zero reg (on
ARM).  But using a flag in the isa files would be a better way than
checking the destination registers explicitly.

On Tue, Apr 2, 2013 at 11:50 AM, Korey Sewell <[email protected]> wrote:

> Hi Mitch,
> I see what you are saying about the atomicity aspect of the IT block. Those
> are fair points. Likewise, it's fair to optimize them about past decode
> like you what your patch does.
>
> I'm looking for something extra such that another CPU model (or code) will
> not look at that instruction and think it's just a "nop". For instance, the
> prefetch instruction is marked with a "Prefetch" flag which allows a CPU
> model to check for prefetch and handle them differently if it wishes to.
>
> To me, it looks like the converged solution is:
> 1) add a flag called "isPurePredicate" (or a better name!) in DynInst.
> 2) Then, in your patch you can give the instruction two flags: "isNop" and
> "isPurePredicate".
> 3) Finally, when the instruction is removed from the CPU, you check to see
> if the "isPurePredicate" is asserted and if the instruction is not
> squashed.  If that condition is true, increment a stat counting how many
> times we performed this optimization.
>
> I'm hoping this both eliminates the IT instruction from the back-end (isNop
> flag)and then allows for a fair accounting of that optimization in the end
> of simulation stats (isPurePredicate flag).
>
> Would you agree with that?
>
>
>
>
> On Mon, Apr 1, 2013 at 12:14 PM, Mitch Hayenga <
> [email protected]
> > wrote:
>
> > "Lastly, this optimization could also applied to any branch instructions
> > that get resolved at decode, right?"
> > That's a good one that I'm definitely going to implement.
> >
> > I think whoever wrote the current IPC counting mechanism was trying to
> > measure backend IPC and not total IPC.  This makes sense by counting data
> > prefetches but not instruction prefetches towards IPC.
> >
> > I'm still with ignoring IT instructions though, since it was originally
> > created when ARM shrank their opcodes for the THUMB instruction set and
> > didn't have enough bits to do their normal predication encoding.  IT
> > instructions just allow the decoder to save and append these bits to
> > recreate the full ARM opcode.  They've also made IT blocks be as atomic
> as
> > possible (only the last instruction is allowed to be a branch and jumps,
> > other than exception returns, into IT blocks are not permitted).  So, in
> my
> > mind IT instructions are effectively part of the "instruction" that the
> > entire block comprises.
> >
> >
> > On Mon, Apr 1, 2013 at 11:16 AM, Korey Sewell <[email protected]> wrote:
> >
> > > Hi Mitch,
> > > Thanks for the quick response. I pretty much agree with the sentiment
> > that
> > > this is a valid optimization but probably disagree a bit on going
> forward
> > > with (3).
> > >
> > > I think you pose a valid question of "If it's already acceptable to not
> > > count ISA-level nops towards IPC, why not IT instructions as well?". My
> > > answer to that would be that whereas nops/prefetches can safely be
> > ignored
> > > and not affect instruction order, you can't literally ignore an IT
> > > instruction without affecting instruction order.
> > >
> > > If I err in that reasoning, then I think I'd be OK with #3, but if it's
> > > the case where the output of the IT instruction is actually needed to
> > alter
> > > control flow then I don't think it's OK to treat it as a nop and ignore
> > it
> > > in stats.
> > >
> > > I'd be for #1 actually. Although it may sound "hackish", each ISA does
> > > have it's own quirks and at commit I wouldn't be against checking the
> > > ISA-specific state to figure out if this were a optimized instruction
> > (mark
> > > a flag in the DynInst) and when it leaves the O3 cpu (instDone()?),
> check
> > > to see if this is flag is asserted but the committed flag isn't. If
> not,
> > > count it as a committed op.
> > >
> > > Lastly, this optimization could also applied to any branch instructions
> > > that get resolved at decode, right?
> > >
> > > -Korey
> > > On Sun, Mar 31, 2013 at 11:36 PM, Mitch Hayenga <
> > > [email protected]> wrote:
> > >
> > >> Re-sending this so it gets sent to the list.
> > >>
> > >> Yes, right now this would not properly credit IPC for IT instructions,
> > >> since nops don't count towards IPC.  I overlooked that since I use
> > >> execution time as my evaluation metric.
> > >>
> > >> Three quick thoughts on this...
> > >> 1)  A quick solution would be to look at the ITstate of committing ops
> > >> and infer a dropped IT instruction.  This would be a bit hackish and
> ARM
> > >> specific though.
> > >> 2)  Maintaining the current method of sending nops through the
> pipeline
> > >> could be made to work.  By going through and modifying the code to be
> > sure
> > >> nops did not count against bandwidth or size restrictions.  You'd also
> > have
> > >> to worry about not impacting stats like rob reads/writes that the
> McPAT
> > >> users would feed to their power models.  And at commit you'd still
> have
> > to
> > >> special case the IT instruction to make sure it got counted.
> > >> 3)  If it's already acceptable to not count ISA-level nops towards
> IPC,
> > >> why not IT instructions as well.  They do feed some information to the
> > >> decoder, but overall their relative work isn't much more than a nop
> > (being
> > >> fetched + decoded).  They also potentially do far less work than a
> > prefetch
> > >> instruction (which is also not counted).
> > >>
> > >> I personally like 3, since the current subset of instructions counted
> > >> towards IPC already seems to have a bit of arbitrariness and would
> > require
> > >> no changes.
> > >>
> > >> PS: I coded this up because I noticed a few times where up to 1/5 of
> my
> > >> instruction window could be occupied by "useless" IT instructions
> > >>
> > >>
> > >>
> > >> On Sun, Mar 31, 2013 at 10:50 PM, Korey Sewell <[email protected]>
> > wrote:
> > >>
> > >>> Hi Mitch,
> > >>> Another thing I wonder about with this patch is the impact on stats.
> > >>>
> > >>> If I recall right, O3 throws aways nops. So when we talk about IPC
> with
> > >>> this patch in, we aren't giving the CPU "credit" for doing what's
> > necessary
> > >>> for the ARM IT instruction right?
> > >>>
> > >>> I'm thinking there may need to be another patch supplemented to this
> > >>> that counts the # of times this optimization happens. That way, we
> > have all
> > >>> the bases covered for instruction/IPC counting.
> > >>>
> > >>> Thoughts?
> > >>>
> > >>> -Korey
> > >>>
> > >>>
> > >>>
> > >>> On Sat, Mar 30, 2013 at 8:54 AM, Mitch Hayenga <
> > >>> [email protected]> wrote:
> > >>>
> > >>>>
> > >>>>
> > >>>> > On March 30, 2013, 7:31 a.m., Ali Saidi wrote:
> > >>>> > > While this seems harmless enough, I wonder if there is some
> > >>>> interaction between faults/interrupts and the instruction that we
> > should
> > >>>> worry about. I haven't given it enough thought to say either way,
> but
> > it
> > >>>> seems like it could be a concern.
> > >>>>
> > >>>> I thought about it somewhat, since IT blocks are required to be able
> > to
> > >>>> handle faults and return to execution properly within an IT block.
>  It
> > >>>> seems the gem5 solution is probably similar to what a real processor
> > >>>> implementation would use, appending the IT state to the PC.  So an
> > >>>> exception/interrupt within an IT block would just return and the
> > decoder
> > >>>> would pick off the extra IT bits from the PC (that detail how to
> > predicate
> > >>>> up to the next 3 ops).  If the exception/interrupt was just prior to
> > the IT
> > >>>> instruction, it would just get sent to the decoder like normal.
> > >>>>
> > >>>> I was thinking more on the "discarding nops at decode" part.  The
> only
> > >>>> case I think that could give that trouble is self-modifying code,
> > since
> > >>>> you'd want to track instruction addresses to know if a snooped write
> > >>>> changed a currently executing instruction.  But gem5 doesn't really
> > provide
> > >>>> that now anyway and you could use cheaper structures to perform that
> > >>>> operation (since false positives would be ok).
> > >>>>
> > >>>>
> > >>>> - Mitch
> > >>>>
> > >>>>
> > >>>> -----------------------------------------------------------
> > >>>>
> > >>>> This is an automatically generated e-mail. To reply, visit:
> > >>>> http://reviews.gem5.org/r/1805/#review4177
> > >>>> -----------------------------------------------------------
> > >>>>
> > >>>>
> > >>>> On March 29, 2013, 7:47 p.m., Mitch Hayenga wrote:
> > >>>> >
> > >>>> > -----------------------------------------------------------
> > >>>>
> > >>>> > This is an automatically generated e-mail. To reply, visit:
> > >>>> > http://reviews.gem5.org/r/1805/
> > >>>> > -----------------------------------------------------------
> > >>>> >
> > >>>> > (Updated March 29, 2013, 7:47 p.m.)
> > >>>> >
> > >>>> >
> > >>>> > Review request for Default.
> > >>>> >
> > >>>> >
> > >>>> > Description
> > >>>> > -------
> > >>>>
> > >>>> >
> > >>>> > Mark ARM IT (if-then) instructions as nops.
> > >>>> >
> > >>>> > ARM's IT instructions predicate up to the next 4 instructions on
> > >>>> various condition codes.  IT instructions really just send control
> > signals
> > >>>> to the decoder, after decode they do not read or write any
> registers.
> > >>>> Marking them as nops (along with the other patch that drops nops at
> > decode)
> > >>>> saves execution resources and bandwidth.
> > >>>> >
> > >>>> >
> > >>>> > Diffs
> > >>>> > -----
> > >>>> >
> > >>>> >   src/arch/arm/isa/insts/misc.isa 47591444a7c5
> > >>>> >
> > >>>> > Diff: http://reviews.gem5.org/r/1805/diff/
> > >>>> >
> > >>>> >
> > >>>> > Testing
> > >>>> > -------
> > >>>> >
> > >>>> > A fast libquantum run.
> > >>>> >
> > >>>> >
> > >>>> > Thanks,
> > >>>> >
> > >>>> > Mitch Hayenga
> > >>>> >
> > >>>> >
> > >>>>
> > >>>> _______________________________________________
> > >>>> gem5-dev mailing list
> > >>>> [email protected]
> > >>>> http://m5sim.org/mailman/listinfo/gem5-dev
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> - Korey
> > >>>
> > >>
> > >>
> > >
> > >
> > > --
> > > - Korey
> > >
> > _______________________________________________
> > gem5-dev mailing list
> > [email protected]
> > http://m5sim.org/mailman/listinfo/gem5-dev
> >
>
>
>
> --
> - Korey
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev
>
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to