This isn't constraining anything. Either you check if the the state variable in the dyn inst or you don't. If you don't check it, it's 1 byte of data that doesn't matter. Another example, is what if I want a smarter CPU model that is able to issue predicated instructions before the destination registers are ready. Unless this piece of state is available that would be impossible to do.
The point should be to create interfaces that encapsulate features, and not architectures. Predication is a feature of an architecture. Itanium predicates lots of instructions too... Ali On Fri, 20 Aug 2010 15:18:45 -0400, Gabriel Michael Black <[email protected]> wrote: > I'd like the CPUs to remain as dumb as possible as far as ISA > semantics and mechanisms so neither they nor the ISAs are > unnecessarily constrained or complicated. In light of that, I think > it's actually better if the CPU has no idea what predication is or > when it may or may not have happened. > > This is a class of stats I don't think we have a good way to collect, > though. We might want to know how often store conditionals fail, how > often compare and swaps fail, how often a register window push/pop > needs to fill/spill, etc. I think a more general mechanism that solved > -this- problem would solve the larger problem, and I think could be > quite useful. > > Gabe > > Quoting Ali Saidi <[email protected]>: > >> >> Yes, but we've not really solved the larger problem that is a CPU model >> should be able to see if a instruction is predicated false. For example, >> it's impossible to create a statistics that is the number of instructions >> that were predicated false when we should just be able to do if >> (inst->readPredicate() == false) predicatedFalse++; in the CPU models. >> >> Ali >> >> >> On Fri, 20 Aug 2010 11:42:23 -0700, Gabe Black <[email protected]> >> wrote: >>> Well the problem is just the load/store instructions, right? Otherwise >>> the execute method can just do/not do whatever it needs without having >>> to coordinate with the CPU. If you make your proposed isa parser >>> changes, the instructions should be able to handle all the other cases >>> internally without too much fuss. >>> >>> Gabe >>> >>> Ali Saidi wrote: >>>> Interesting... we're top posting on this thread and bottom posting on >> the >>>> other one... >>>> >>>> Anyway... yes you're correct initiateAcc() is called one instruction at >>>> at >>>> time. >>>> >>>> For memory ops alone the o3 model could be changed in >> executeLoad/Store() >>>> inst->predicated = false; // assume failure >>>> load_fault = inst->initiateAcc(); >>>> >>>> if (inst->predicated) .... >>>> >>>> However, to make this work, the read()/write() methods would have to >> set >>>> inst->predicated = true; >>>> >>>> That isn't so bad, but I still have two problems with this method: >>>> 1) This only works for load/store instructions. There isn't a >>>> corresponding way to do this for any other type of op since they're in >>>> either case they're going to call setInt/FloatReg(). >>>> 2) I'm not really a fan on the idea that the absence of a call triggers >>>> is >>>> what triggers the mechanism. It seems like a convoluted way to instead >> of >>>> having: >>>> if (testPredicate(....)) { >>>> ..... >>>> } else { >>>> .... >>>> xc->setPredicate(false); >>>> } >>>> >>>> Doing it the way it's currently implemented also means that the same >>>> mechanism works for multiple cpu models that might need it. >>>> >>>> Thanks, >>>> Ali >>>> >>>> >>>> >>>> On Thu, 19 Aug 2010 20:45:41 -0700, Gabe Black <[email protected]> >>>> wrote: >>>> >>>>> O3 only -seems- to execute multiple memory instructions at a time. >> Each >>>>> initiateAcc is called one at a time and completes execution before the >>>>> next starts, so the same thing should apply as far as that goes. >>>>> >>>>> Gabe >>>>> >>>>> Ali Saidi wrote: >>>>> >>>>>> Remember that the timing cpu is only executing one instruction at a >>>>>> >>>> time. >>>> >>>>>> If the instruction calls read() and no access isn't set the timing >> cpu >>>>>> packages up the request ships it out and sets it's state to >>>>>> DcacheWaitResponse. If the instruction doesn't call read() it >> continues >>>>>> on >>>>>> like nothing happened {because it didn't). With the o3 cpu there are >>>>>> multiple instructions in-flight, so simply waiting for it to call >> read >>>>>> >>>> or >>>> >>>>>> not isn't an option. Something has to be passed through the xc to >> tell >>>>>> the >>>>>> cpu that this instruction is done since the normal mechanism won't >> take >>>>>> care of it. The way this works now is by passing back something other >>>>>> than >>>>>> NoFault. However, the instruction didn't actually fault, so then we >>>>>> >>>> would >>>> >>>>>> have to special case everything that reads that fault later on in the >>>>>> pipeline to say if its a predicationfault, do the same thing you >> would >>>>>> >>>> do >>>> >>>>>> for no fault. This seems worse and more error prone. >>>>>> >>>>>> Thanks, >>>>>> ali >>>>>> >>>>>> >>>>>> On Thu, 19 Aug 2010 15:16:22 -0400, Gabriel Michael Black >>>>>> <[email protected]> wrote: >>>>>> >>>>>> >>>>>>> I don't think that's true, but I may be confused. I think, at least >> >>>>>>> for the timing CPU, that it checks if a read/write was called and >>>>>>> doesn't just fall through. The timing CPU would wait forever for a >>>>>>> read/write response that would never come otherwise. (digs around a >> >>>>>>> little) I think it's sort of like what I described. The CPU will >>>>>>> continue if it's not waiting for anything, which would be the case >> if >>>>>>> >>>> >>>> >>>>>>> no access actually happened. We could probably get the same behavior >> >>>>>>> if we checked if the instruction was waiting for a read/write >>>>>>> response, but that might be kept somewhere annoying to get at. >>>>>>> Generally, if we can hide the existence of predication from the >> CPUs, >>>>>>> >>>> >>>> >>>>>>> I think that'll make everyones life easier (except for the ARM >> ISA's, >>>>>>> >>>> >>>> >>>>>>> I suppose). >>>>>>> >>>>>>> Gabe >>>>>>> >>>>>>> Quoting Ali Saidi <[email protected]>: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> It's not the same issue here. The simple cpus just have their >>>>>>>> execute/completeAccess methods guarded by a predicate condition >> test. >>>>>>>> >>>>>>>> >>>>>> If >>>>>> >>>>>> >>>>>>>> nothing happens in there, so be it and the cpu goes onto the next >>>>>>>> instruction without complaint. The out of order cpu on the other >>>>>>>> >>>> hand >>>> >>>>>>>> needs to know if the instruction was predicated false so it can >>>>>>>> >>>> notify >>>> >>>>>>>> commit that it is complete, even though it hasn't done anything. If >>>>>>>> commit >>>>>>>> isn't notified, the instruction will never commit and the processor >>>>>>>> >>>>>>>> >>>>>> will >>>>>> >>>>>> >>>>>>>> stall. >>>>>>>> >>>>>>>> This information should clearly belong in the dyninst. Unless there >>>>>>>> >>>> is >>>> >>>>>>>> some other way to access the class from the the isa description, I >>>>>>>> >>>>>>>> >>>>>> think >>>>>> >>>>>> >>>>>>>> the change is correct. An alternate approach would be to have the >>>>>>>> >>>>>>>> >>>>>> method >>>>>> >>>>>> >>>>>>>> in >>>>>>>> threadstate do nothing because it's unimportant for it. >>>>>>>> >>>>>>>> Ali >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, 17 Aug 2010 19:04:10 -0400, Gabriel Michael Black >>>>>>>> <[email protected]> wrote: >>>>>>>> >>>>>>>> >>>>>>>>> Sorry if I wasn't clear before (I reread my post and it sounded a >>>>>>>>> little vague) but what the simple CPU does is keep track of >> whether >>>>>>>>> the supposed memory instruction actually calls read or write on >> the >>>>>>>>> execution context. If not, then the CPU doesn't try to complete >> any >>>>>>>>> access, it just considers that part over. Ideally we can do the >> same >>>>>>>>> thing here. >>>>>>>>> >>>>>>>>> Gabe >>>>>>>>> >>>>>>>>> Quoting Ali Saidi <[email protected]>: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Anyone have comments on this? It seems like this is the only way >> to >>>>>>>>>> access >>>>>>>>>> the DynInst from the isa description. Threadstate does have the >>>>>>>>>> >>>>>>>>>> >>>>>> current >>>>>> >>>>>> >>>>>>>>>> instruction in it, as well as things like "Temporary storage to >>>>>>>>>> >>>> pass >>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> the >>>>>>>> >>>>>>>> >>>>>>>>>> source address from copy_load to". It doesn't seem to out of >> place >>>>>>>>>> >>>> to >>>> >>>>>>>>>> include current instruction predication state in there. >>>>>>>>>> >>>>>>>>>> Ali >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sat, 14 Aug 2010 07:08:35 -0000, "Gabe Black" >>>>>>>>>> >>>>>>>>>> >>>>>>>> <[email protected]> >>>>>>>> >>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> ----------------------------------------------------------- >>>>>>>>>>> This is an automatically generated e-mail. To reply, visit: >>>>>>>>>>> http://reviews.m5sim.org/r/177/#review185 >>>>>>>>>>> ----------------------------------------------------------- >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> src/cpu/thread_context.hh >>>>>>>>>>> <http://reviews.m5sim.org/r/177/#comment326> >>>>>>>>>>> >>>>>>>>>>> This isn't really a property of a thread, it's the property >> of >>>>>>>>>>> >>>> a >>>> >>>>>>>>>>> single instruction. I don't think this is being done in the >>>>>>>>>>> >>>>>>>>>>> >>>>>> right >>>>>> >>>>>> >>>>>>>>>>> place. I think we should have a discussion on m5-dev to >>>>>>>>>>> >>>>>>>>>>> >>>>>> determine >>>>>> >>>>>> >>>>>>>>>> the >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> best way to handle this. There was a little code added to >> the >>>>>>>>>>> >>>>>>>>>>> >>>>>>>> simple >>>>>>>> >>>>>>>> >>>>>>>>>>> CPU that does what this is supposed to do if a memory >>>>>>>>>>> >>>>>>>>>>> >>>>>> instruction >>>>>> >>>>>> >>>>>>>>>>> didn't actually read or write memory, and I think this is a >>>>>>>>>>> >>>>>>>>>>> >>>>>> better >>>>>> >>>>>> >>>>>>>>>> way >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> to handle this. We should have a discussion about this on >>>>>>>>>>> >>>>>>>>>>> >>>>>> m5-dev, >>>>>> >>>>>> >>>>>>>>>>> especially since it touches lots of low level bits like >>>>>>>>>>> >>>>>>>>>>> >>>>>> *contexts, >>>>>> >>>>>> >>>>>>>>>>> instruction behavior, CPUs, etc. These sorts of changes need >>>>>>>>>>> >>>> to >>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>> be >>>>>> >>>>>> >>>>>>>>>> made >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> carefully. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> - Gabe >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 2010-08-13 10:12:35, Ali Saidi wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> ----------------------------------------------------------- >>>>>>>>>>>> This is an automatically generated e-mail. To reply, visit: >>>>>>>>>>>> http://reviews.m5sim.org/r/177/ >>>>>>>>>>>> ----------------------------------------------------------- >>>>>>>>>>>> >>>>>>>>>>>> (Updated 2010-08-13 10:12:35) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Review request for Default and Min Kyu Jeong. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Summary >>>>>>>>>>>> ------- >>>>>>>>>>>> >>>>>>>>>>>> ARM/O3: store the result of the predicate evaluation in DynInst >>>>>>>>>>>> >>>> or >>>> >>>>>>>>>>>> Threadstate. >>>>>>>>>>>> THis allows the CPU to handle predicated-false instructions >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> accordingly. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> This particular patch makes loads that are predicated-false to >> be >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>> sent >>>>>>>> >>>>>>>> >>>>>>>>>>>> straight to the commit stage directly, not waiting for return >> of >>>>>>>>>>>> >>>>>>>>>>>> >>>>>> the >>>>>> >>>>>> >>>>>>>>>> data >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> that was never requested since it was predicated-false. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Diffs >>>>>>>>>>>> ----- >>>>>>>>>>>> >>>>>>>>>>>> src/arch/arm/isa/templates/mem.isa 3c48b2b3cb83 >>>>>>>>>>>> src/arch/arm/isa/templates/pred.isa 3c48b2b3cb83 >>>>>>>>>>>> src/cpu/base_dyn_inst.hh 3c48b2b3cb83 >>>>>>>>>>>> src/cpu/base_dyn_inst_impl.hh 3c48b2b3cb83 >>>>>>>>>>>> src/cpu/o3/lsq_unit_impl.hh 3c48b2b3cb83 >>>>>>>>>>>> src/cpu/simple/base.hh 3c48b2b3cb83 >>>>>>>>>>>> src/cpu/simple_thread.hh 3c48b2b3cb83 >>>>>>>>>>>> src/cpu/thread_context.hh 3c48b2b3cb83 >>>>>>>>>>>> >>>>>>>>>>>> Diff: http://reviews.m5sim.org/r/177/diff >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Testing >>>>>>>>>>>> ------- >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Ali >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> m5-dev mailing list >>>>>>>>>>> [email protected] >>>>>>>>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> m5-dev mailing list >>>>>>>>>> [email protected] >>>>>>>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> m5-dev mailing list >>>>>>>> [email protected] >>>>>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>>>>> >>>>>>>> >>>>>>>> >>>>>> _______________________________________________ >>>>>> m5-dev mailing list >>>>>> [email protected] >>>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> m5-dev mailing list >>>>> [email protected] >>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>> >>>> _______________________________________________ >>>> m5-dev mailing list >>>> [email protected] >>>> http://m5sim.org/mailman/listinfo/m5-dev >>>> >>> >>> _______________________________________________ >>> m5-dev mailing list >>> [email protected] >>> http://m5sim.org/mailman/listinfo/m5-dev >> _______________________________________________ >> m5-dev mailing list >> [email protected] >> http://m5sim.org/mailman/listinfo/m5-dev >> > > > _______________________________________________ > m5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/m5-dev _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
