Ok, here is another shot at explaining what might be going on wrong. I think this is what happens --

1. A sysret instruction is in execution. It commits some of its microops, including the one that write the CS selector.

2. There is a pending interrupt, but it is not being handled since the cpu->instList() is non-empty.

3. One of the microops is marked isSquashAfter. So after this microop commmits, all the microops after it are squashed. This empties the cpu->instList().

4. The interrupt is handled, and a trap is scheduled.

5. The trap squashes sysret's microops (which were added again by fetch), changes the PC to Microcode_ROM:<something>.

6. It seems this microcode saves the CS that was written by the sysret, but the instruction pointer saved is the pointer to the sysret instruction since this instruction is yet to complete.

7. When the iret instruction tries to restore the context to the previously executing context (which is the sysret instruction), it restores the CS as the one that the sysret instruction wrote. This CS has a user mode privilege level. But rip points to the sysret instruction, which should be on a kernel mode page since we are returning from a system call.

8. When the TLB lookup is performed for fetching the instruction pointed to by the rip, the TLB generates a page fault, in which a user mode process is trying to access a kernel mode page. This instruction reaches the commit stage of the processor at which point (I think) the operating system realizes that some thing wrong has happened and it outputs the segfault message to the terminal.


I think the fix remains the same, isSquashAfter should not squash all the microops, but only those that do not belong to this particular instruction. Or else, fetch should simply not fetch any microop once it sees a microop that is marked isSquashAfter.

Any opinions on this?

--
Nilay
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to