On Mon, 26 Dec 2011, Nilay Vaish wrote:
I think I have figured out where the problem is. In revision 8500, Gabe added
a check that if a certain microop affects state so that fetch stage also gets
affected, then this microop should be marked isSquashAfter. isSquashAfter, as
per my reading of the code, results in all microops after this microop
getting squashed and refetched. If the current microop changes the CPL, then
fetching the instruction again should result in a fault.
Take a look at the trace below.
5145127448500: system.cpu.commit: Trying to commit head instruction,
[sn:1122692274] [tid:0]
5145127448500: system.cpu.commit: Committing instruction with [sn:1122692274]
PC (0xffffffff80209ae8=>0xffffffff80209aea).(52=>53)
5145127421500: system.cpu + A0 T0 : @iret_label.52 : IRET_PROT : wrdl
%ctrl140, t8, t2 : IntAlu : D=0x000000000000abd3 FetchSeq=1122692274
CPSeq=992061075
5145127448500: system.cpu.rob: [tid:0]: Retiring head instruction,
instruction PC (0xffffffff80209ae8=>0xffffffff80209aea).(52=>53),
[sn:1122692274]
5145127448500: system.cpu: Removing committed instruction [tid:0] PC
(0xffffffff80209ae8=>0xffffffff80209aea).(52=>53) [sn:1122692274]
5145127448500: system.cpu.rob: Starting to squash within the ROB.
5145127448500: system.cpu.rob: [tid:0]: Squashing instructions until
[sn:1122692274].
5145127448500: system.cpu.rob: [tid:0]: Squashing instruction PC
(0xffffffff80209ae8=>0xffffffff80209aea).(53=>54), seq num 1122692275.
5145127448500: system.cpu.rob: Reached head of instruction list while
squashing.
All of a sudden the CPU starts squashing all the microops, even though there
was no branch misprediction, or memory dependence misprediction. Reading the
code in commit_impl.hh, it seems that the head microop (wrdl) is marked
isSquashAfter, and hence all the microops will get squashed.
This is what happens some time later.
5145127449500: system.cpu.fetch: [tid:0]: Issuing a pipelined I-cache access,
starting at PC (0xffffffff80209aea=>0xffffffff80209af2).(0=>1).
5145127449500: system.cpu.fetch: [tid:0] Fetching cache line
0xffffffff80209ac0 for addr 0xffffffff80209ae8
5145127449500: system.cpu.itb: Translating vaddr 0xffffffff80209ac0.
5145127449500: system.cpu.itb: In protected mode.
5145127449500: system.cpu.itb: Paging enabled.
5145127449500: system.cpu.itb: Matched vaddr 0xffffffff80209ac0 to entry
starting at 0xffffffff80200000 with size 0x200000.
5145127449500: system.cpu.itb: Entry found with paddr 0x200000, doing
protection checks.
5145127449500: system.cpu.itb: Trying to access kernel mode page from user
mode.
5145127449500: system.cpu: CPU already running.
5145127449500: system.cpu.fetch: [tid:0] Got back req with addr
0xffffffff80209ac0 but expected 0xffffffff80209ac0
5145127449500: system.cpu.fetch: [tid:0]: Translation faulted, building noop.
5145127449500: global: DynInst: [sn:1122692293] Instruction created.
Instcount for system.cpu = 156
5145127449500: system.cpu.fetch: [tid:0]: Instruction PC 0xffffffff80209aea
(0) created [sn:1122692293].
5145127449500: system.cpu.fetch: [tid:0]: Instruction is: NOP
5145127449500: system.cpu.fetch: Activity this cycle.
5145127449500: system.cpu.fetch: [tid:0]: Blocked, need to handle the trap.
5145127449500: system.cpu.fetch: [tid:0]: fault (Page-Fault) detected @ PC
(0xffffffff80209aea=>0xffffffff80209af2).(0=>1).
I think we need to do two things --
* Do not squash all the microops. Only the ones that do not belong to this
instruction should be squashed.
* Do not fetch while executing a serializing instruction. This is stated in
Intel's manual as well.
--
Nilay
I think I spoke too soon. I do not see any of those faults generated due
to failed protection checks making it to the commit stage of the O3 CPU.
So, what ever I wrote in the previous email may not hold at all.
--
Nilay
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev