Hi all,
Correct me if I am wrong, but as I understand the code, for a store
instruction, the D-Cache is only accessed when the store writes back,
that is when it is at the head of the LSQ and has been committed and a
port is available.
Now, if there is a miss in the L1 (and possibly the L2), we have to
suffer all the miss latency, then do the write back. However, the
effective address of the store is computed at execute, meaning that we
should be able to probe the cache at execute to bring the line if it is
not there. If it is, then we only have to wait for the store to commit
and write back.
I agree that the SQ is not on the critical path since stores are already
committed so they are not blocking other instructions. That is only when
the SQ has free entries. When it is full, hiding some of the miss
latency allows the store to writeback faster and release its slot in the
SQ faster. I grant you that this may not impact performance that much,
however I am trying to find a quick and dirty way to implement it to
find out. For now, I am simply issuing a HW Prefetch Request to the
memory hierarchy at execute on behalf of the store instruction. It
appears to do the trick and to preserve correctness. Yet, I am clearly
no expert with the way memory works in gem5 so if this sounds like
something really stupid to do, please let me know. I guess the "right"
solution would be to create a new request type and insert code to handle
it in the proper places.
I also have another question which is x86-related. In O3, the needsTSO
variable of the LSQ is set only when the ISA is x86. As I understand it,
it forbids more than one store to be written back every cycle. Yet, if
two ports are available at any given cycle, why not write two? They will
still be written in-order...won't they?
Thanks for reading,
Arthur Perais.
_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users