Hello,

I've been wondering how memory violations between load store pairs affect ipc 
and found that reducing violations by >90% results in an IPC increase of less 
than 5% across many different benchmarks. The results seem pretty counter 
intuitive to me as I thought the overhead of flushing the pipeline and 
restarting execution would be steep.

Here was my experimental setup: I have two versions of gem5. The first one is 
straight from the repo with no changes, and uses storesets for its memory 
dependance prediction. The second version is blind speculation only has one 
change from the default version which is that it always returns "no violation" 
when asked for a memory dependence prediction. I did this by always returning 0 
in the checkInst function in store_set.cc. Both codebases were compiled into 
X86/gem5.fast and both ran the same 11 programs from Mibench. I used the 
default O3 parameters so the architecture is a 8 wide superscalar machine with 
a 192 ROB, 256 registers, and 32 LQ/32SQ. Programs ran for 10M instr after fast 
forwarding 50M instructions.


I then looked at the number of memory violations and the IPC for all the runs 
and only crc, sha and dijkstra got meaningful IPC gains while the other 8 
programs did not see a benefit even though number of memory violations were 
dramatically reduced. Results are here: https://pastebin.com/raw/HhUKMha5

So my question is does gem5 not penalize a memory order violation as heavily as 
a branch misprediction? Or is the overhead of recovering from a memory 
violation truly not that big in practice? I would appreciate any insight that 
could help me reconcile what I'm seeing in the experimental data with my 
intuition


_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to