I've run into a buggy interaction for the ARM ISA between a TBH (or TBB) instruction and a dependent memory operation (that gets squashed) in the O3 model leading to erroneous behavior when diffed against the Atomic model. The TBH instruction is a table-based branch that has to index into memory to calculate its branch destination, so it is both a branch and a memory op. The buggy behavior is as follows:
1) Fetch a TBH, predict branch destination 2) Begin fetching from predicted PC (which happens to be correct in my buggy run) 3) Issue younger dependent memory op to LSQ and send request to cache ahead of TBH which is waiting on register operands 4) Issue TBH to LSQ to read memory for branch destination 5) Memory violation detection with younger instruction and squash for memory ordering --- This squash then calls squashDueToMemOrder(...), which redirects the PC of Fetch to a stale PC value stored in the TBH dyn-inst object as it hasn't yet calculated its true PC 6) Start fetching down wrong path 7) TBH completes, but since the branch part was predicted correctly, no additional squash happens in checkMisprediction (which it may not even check due the already outstanding squash) I see two ways to fix this, either hack up the O3 model to handle this case of a fused memory-op and branch instruction (recheck to squash when the TBH finally resolves for the special case of squashing dependent memory ops causing the fetch to screw up the branch), or split the instruction into 2 micro-ops (the load and then a dependent branch). Which one do people think would be the better option? I'm currently leaning toward micro-coding the instruction. Thanks, Geoff Blake _______________________________________________ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev