> On 2011-09-01 11:54:03, Ali Saidi wrote:
> > src/arch/alpha/faults.hh, line 44
> > <http://reviews.m5sim.org/r/823/diff/1/?file=14607#file14607line44>
> >
> >     I initially tried the fault in sim approach, however that doesn't work 
> > because the pcstate object is resolved by the namespace and you can't do 
> > any template magic to make that happen. There needs to be seperate classes 
> > each defined in a namespace FooISA{ ... } or at least using namespace 
> > FooISA. While it could be done by doing using namespace TheISA that gets us 
> > further away from being able to support a single binary with multiple ISAs, 
> > so I don't like it nearly as much. As for the fault, it's a generic fault 
> > that could apply to all the CPU models (inorder, out-of-order, any other 
> > model) so I think it's more generic than just the o3 even though it's only 
> > used here. The fault could also be used in the future to get rid of some of 
> > the more specific squashDueToFoo instructions and just use more generic 
> > faults to do the job for you. 
> >     
> >
> 
> Gabe Black wrote:
>     I don't think TheISA would be a problem, especially if you don't use that 
> temporary since then you wouldn't need to specify the namespace at all. The 
> problem, though, seems to stem from the fact that you're reading and writing 
> the PC. Really all you need to do is write to the threadcontext to convince 
> O3 it needs to flush the pipe, and you can do that by writing anything. You 
> could even add a method which doesn't write anything and just triggers a 
> flush. That also touches on what I was saying a few comments down. If you 
> read the PC you're really creating a local copy of a structure and copying 
> around a chunk of memory, and then when you write it again you put it back. 
> You're not actually moving around useful data, though, so all that copying is 
> wasted effort. If you were to use, say, an int reg, you'd at least only copy 
> around an int.
>     
>     The problem with using faults more generally is that it presumes that a 
> CPU will care that you're writing to some state. That's really just how O3 
> works and isn't a requirement at all, so while it might be a useful technique 
> in other CPUs, it's not a generic mechanism. Also, if you're specialized to 
> O3 you can just call functions on the CPU and get things to flush for sure 
> without having to trick the CPU.
> 
> Ali Saidi wrote:
>     If that worked, it would be a nice idea (just need to poke the thread 
> context a little bit so that the CPU calls squashFromTC(). However, I went 
> back and looked at the code and it won't work because when the fault is 
> executed inSyscall (the badly named don't cause a fault on an external write 
> flag) is set to true, so external writes won't cause squashes. The only other 
> thing I can think of is to invent yet another mechanism in the O3 CPU to do 
> flushing, but there are more than enough of those already. Actually, I hope 
> that we could move some of the existing ones to attaching faults to 
> instructions over time. Thoughts?
> 
> Gabe Black wrote:
>     I thought about this a bit, and actually I don't think you really want to 
> flush. What it seems like you really want to do is to squash all newer 
> instructions once you discover the problem in the LSQ and then get fetch 
> going again from there. Then you don't have to wait for the instruction to 
> make it to the front of the commit queue before getting back on the right 
> path. So even though it would be yet another squash mechanism, a squash 
> mechanism is a better solution than a flushing mechanism. With a fault based 
> flush you have to drain out the whole pipe, but with a squash you'd only have 
> a bubble over part of the pipe.
>     
>     That said, you're right that the number and diversity of squashing 
> mechanisms in O3 is a bit overwhelming, confusing, error prone, hard to 
> maintain, etc., etc. Rather than not using them, though, it would be best to 
> actually fix that and make them more consistent and generalized. Ideally 
> there would be a single squashing mechanism which would just be applied in 
> each place it was needed, or at least a single system of mechanisms which 
> were all just parameterized or mildly tweaked versions of the same thing, 
> hopefully with lots of code reuse.

Yes, another way to handle it is to selectively flush newer instructions. 
However, the performance benefit of doing so is minimal. When checkSnoop() is 
called an a ReExec fault occurs the instruction has likely already left the LSQ 
is is waiting to commit. Dependent instructions have already been issued, could 
have been executed, and are also likely waiting to commit. Every instruction 
after that load must be thrown away. Is it possible that a few cycles could be 
saved by squashing early in this case and letting the fetch stage re-fetch, 
yes. Would it actually be done in a real processor? Maybe, but not at all 
guaranteed. There is very little benefit and a huge amount of complexity. The 
instruction is already going to be in the L1 cache because it just executed, so 
unlike a branch mispredict, there isn't going to be much in the way of benefit 
getting fetch to re-direct early. Because the instruction has already executed 
and is waiting to commit there aren't many cycles until 
 the flush will occur. Finally, It's a very rare case where this fault occurs, 
unlike the previous implementation that signaled a fault in many cases where it 
wasn't at all necessary. The proof is that this patch improves performance 
substantially on ARM and x86 which frequently trip the old mechanism even 
though they shouldn't because of delayed translations. 

I agree that ideally there would be a single squashing mechanism, but there 
isn't and that change is outside of the scope of this one.   


- Ali


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/823/#review1478
-----------------------------------------------------------


On 2011-08-18 19:41:19, Ali Saidi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.m5sim.org/r/823/
> -----------------------------------------------------------
> 
> (Updated 2011-08-18 19:41:19)
> 
> 
> Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and 
> Nathan Binkert.
> 
> 
> Summary
> -------
> 
> LSQ: Only trigger a memory  violation with a load/load if the value changes.
> 
> Only create a memory ordering violation when the value could have changed
> between two subsequent loads, instead of just when loads go out-of-order
> to the same address. While not very common in the case of Alpha, with
> an architecture with a hardware table walker this can happen reasonably
> frequently beacuse a translation will miss and start a table walk and
> before the CPU re-schedules the faulting instruction another one will
> pass it to the same address (or cache block depending on the dependency
> checking).
> 
> The performance improvement on SPEC benchmarks can be substantial (2-10%).
> 
> 
> Diffs
> -----
> 
>   src/arch/alpha/faults.hh 7f49e6a176b8 
>   src/arch/alpha/faults.cc 7f49e6a176b8 
>   src/arch/arm/faults.hh 7f49e6a176b8 
>   src/arch/mips/faults.hh 7f49e6a176b8 
>   src/arch/mips/faults.cc 7f49e6a176b8 
>   src/arch/power/SConscript 7f49e6a176b8 
>   src/arch/power/faults.hh 7f49e6a176b8 
>   src/arch/power/faults.cc PRE-CREATION 
>   src/arch/sparc/faults.hh 7f49e6a176b8 
>   src/arch/sparc/faults.cc 7f49e6a176b8 
>   src/arch/x86/faults.hh 7f49e6a176b8 
>   src/arch/x86/faults.cc 7f49e6a176b8 
>   src/cpu/base_dyn_inst.hh 7f49e6a176b8 
>   src/cpu/base_dyn_inst_impl.hh 7f49e6a176b8 
>   src/cpu/o3/lsq_impl.hh 7f49e6a176b8 
>   src/cpu/o3/lsq_unit.hh 7f49e6a176b8 
>   src/cpu/o3/lsq_unit_impl.hh 7f49e6a176b8 
> 
> Diff: http://reviews.m5sim.org/r/823/diff
> 
> 
> Testing
> -------
> 
> This patch has been tested with a couple of self-checking hand crafted 
> programs I wrote to stress ordering between two cores. With injected bugs it 
> definitely shows a failure and it doesn't with this implementation (and it 
> caught one issue during development).
> 
> 
> Thanks,
> 
> Ali
> 
>

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to