The microcode that's executing is in src/arch/x86/isa/insts/romutil.py I
think, and it looks like your stack is bad. That's where the vectoring
microcode checks to see that it will be able to write out the interrupt
stack frame, and it apparently can't. That triggers another page fault, and
it has the same problem. You'll need to determine why your stack ends up
out of whack, or why that code might not be handling the stack in an
exactly correct way which makes it fault when it shouldn't.

Gabe

On Mon, Nov 12, 2018 at 8:25 AM Kleovoulos Kalaitzidis <
[email protected]> wrote:

> Hello,
> just to give more detail, I have attached here a part of the simout file
> before the first appearance of the page fault that after keeps
> executing.
>
> --
> Kleovoulos Kalaitzidis
> Doctorant - Équipe PACAP
>
> Centre de recherche INRIA Rennes - Bretagne Atlantique
> Bâtiment 12E, Bureau E321, Campus de Beaulieu,
> 35042 Rennes Cedex, France
>
> ------------------------------
>
> *From: *"Kleovoulos Kalaitzidis" <[email protected]>
> *To: *"gem5 users mailing list" <[email protected]>
> *Sent: *Monday, November 12, 2018 4:09:56 PM
> *Subject: *[gem5-users] Microcode_ROM page fault not handled
>
> Hello everyone,
>
> I am currently using FS mode to simulate and execute SPEC benchmarks. The
> image I use is an Ubuntu-16.04 and the kernel I built for that is
> vmlinux-4-15.
> To settle up the FS simulation environment, create the image file and
> build the kernel I have followed Jason's instructions from here:
> http://www.lowepower.com/jason/setting-up-gem5-full-system.html
> I run my simulations with x86 and I have already taken some checkpoints
> for FS, so now I use them to restore and execute the benchmarks. However,
> after some testing
> I found out that most of them after some time following the restore they
> execute infinite loops of micro ops without proceeding in the total
> benchmark execution, because the number of executed instructions
> would not change (after some printing within execution)
>
> *The gem5 command to restore first checkpoint is here*
> : /build/X86/gem5.opt --redirect-stdout --redirect-stderr --outdir=/outdir
> /configs/example/fs.py --cpu-type=DerivO3CPU -n 1 --caches --l2cache
> --mem-type=DDR4_2400_16x4
>
>                --mem-size=8GB --sys-clock=4GHz --cpu-clock=4GHz
> --kernel=/path_to_kernel/vmlinux-4-15
> --disk-image=/path_to_image/ubuntu-min-16-04.img
>
>                --checkpoint-dir=/path_to_checkpoint_dir/ -r 1
>
> To tackle the problem I found the aforementioned recurring loop of micro
> ops and I saw that it keeps executing micro ops related with instruction
> *Microcode_ROM*
> After some search I found this older thread where someone else had a quite
> similar problem :
> https://www.mail-archive.com/[email protected]/msg13058.html
>
> So I followed same pattern, I used the --debug-flags=Exec,LocalApic,Faults
> and I get this output :
>
> 32985546164250: system.switch_cpus T0 : @__do_page_fault+716.32930 :
>  Microcode_ROM : ldst   t0, HS:[t6] : MemRead :  A=0xfffffe0000001fd0
> 32985546172500: system.switch_cpus T0 : @__do_page_fault+716.32890 :
>  Microcode_ROM : slli   t4, t1, 0x4 : IntAlu :  D=0x00000000000000e0
> 32985546172750: system.switch_cpus T0 : @__do_page_fault+716.32891 :
>  Microcode_ROM : ld   t2, IDTR:[t4 + 0x8] : MemRead :  D=0x00000000ffffffff
> A=0xfffffe00000000e8
> 32985546173000: system.switch_cpus T0 : @__do_page_fault+716.32892 :
>  Microcode_ROM : ld   t4, IDTR:[t4] : MemRead :  D=0x81a08e00001015d0
> A=0xfffffe00000000e0
> 32985546173250: system.switch_cpus T0 : @__do_page_fault+716.32893 :
>  Microcode_ROM : chks   , t4b, 0x3 : IntAlu :
> 32985546173500: system.switch_cpus T0 : @__do_page_fault+716.32894 :
>  Microcode_ROM : srli   t10, t4, 0x10 : IntAlu :  D=0x000081a08e000010
> 32985546173750: system.switch_cpus T0 : @__do_page_fault+716.32895 :
>  Microcode_ROM : andi   t5, t10, 0xf8 : IntAlu :  D=0x0000000000000010
> 32985546174000: system.switch_cpus T0 : @__do_page_fault+716.32896 :
>  Microcode_ROM : andi   t0w, t10w, 0x4 : IntAlu :  D=0x0000000000000020
> 32985546174250: system.switch_cpus T0 : @__do_page_fault+716.32897 :
>  Microcode_ROM : br   0x8084 : No_OpClass :
> 32985546176500: system.switch_cpus T0 : @__do_page_fault+716.32900 :
>  Microcode_ROM : ld   t3, TSG:[t5] : MemRead :  D=0x00af9b000000ffff
> A=0xfffffe0000001010
> 32985546176750: system.switch_cpus T0 : @__do_page_fault+716.32901 :
>  Microcode_ROM : chks   , t3, 0x7 : IntAlu :
> 32985546177000: system.switch_cpus T0 : @__do_page_fault+716.32902 :
>  Microcode_ROM : wrdl   %ctrl145, t3, t10 : IntAlu :  D=0x000000000000abd0
> 32985546177250: system.switch_cpus T0 : @__do_page_fault+716.32903 :
>  Microcode_ROM : wrdh   t9, t4, t2 : IntAlu :  D=0xffffffff81a015d0
> 32985546177500: system.switch_cpus T0 : @__do_page_fault+716.32904 :
>  Microcode_ROM : rdsel   t11b, t11b, %ctrl128 : IntAlu :
> D=0x0000000000000000
> 32985546177750: system.switch_cpus T0 : @__do_page_fault+716.32905 :
>  Microcode_ROM : rdattr   t10, %ctrl184,  : IntAlu :  D=0x000000000000abd0
> 32985546178000: system.switch_cpus T0 : @__do_page_fault+716.32906 :
>  Microcode_ROM : andi   t10, t10, 0x3 : IntAlu :  D=0x0000000000000000
> 32985546178250: system.switch_cpus T0 : @__do_page_fault+716.32907 :
>  Microcode_ROM : rdattr   t5, %ctrl179,  : IntAlu :  D=0x000000000000abd0
> 32985546178500: system.switch_cpus T0 : @__do_page_fault+716.32908 :
>  Microcode_ROM : andi   t5, t5, 0x3 : IntAlu :  D=0x0000000000000000
> 32985546178750: system.switch_cpus T0 : @__do_page_fault+716.32909 :
>  Microcode_ROM : sub   t0, t5, t10 : IntAlu :  D=0x0000000000000020
> 32985546179000: system.switch_cpus T0 : @__do_page_fault+716.32910 :
>  Microcode_ROM : mov   t11b, t0b, t0b : IntAlu :  D=0x0000000000000000
> 32985546179250: system.switch_cpus T0 : @__do_page_fault+716.32911 :
>  Microcode_ROM : srli   t12, t4, 0x20 : IntAlu :  D=0x0000000081a08e00
> 32985546179500: system.switch_cpus T0 : @__do_page_fault+716.32912 :
>  Microcode_ROM : andi   t12, t12, 0x7 : IntAlu :  D=0x0000000000000000
> 32985546179750: system.switch_cpus T0 : @__do_page_fault+716.32913 :
>  Microcode_ROM : subi   t0, t12, 0x1 : IntAlu :  D=0x0000000000000008
> 32985546180000: system.switch_cpus T0 : @__do_page_fault+716.32914 :
>  Microcode_ROM : br   0x8096 : No_OpClass :
> 32985546215500: system.switch_cpus T0 : @__do_page_fault+716.32915 :
>  Microcode_ROM : br   0x8098 : No_OpClass :
> 32985546217500: system.switch_cpus T0 : @__do_page_fault+716.32916 :
>  Microcode_ROM : mov   t6, t6, rsp : IntAlu :  D=0xfffffe0000002000
> 32985546217750: system.switch_cpus T0 : @__do_page_fault+716.32917 :
>  Microcode_ROM : br   0x8099 : No_OpClass :
> 32985546219750: system.switch_cpus T0 : @__do_page_fault+716.32921 :
>  Microcode_ROM : andi   t6b, t6b, 0xf0 : IntAlu :  D=0xfffffe0000002000
> 32985546220000: system.switch_cpus T0 : @__do_page_fault+716.32922 :
>  Microcode_ROM : subi   t6, t6, 0x30 : IntAlu :  D=0xfffffe0000001fd0
> 32985546220250: system.switch_cpus T0 : @__do_page_fault+716.32923 :
>  Microcode_ROM : wrip   , t0, t9 : IntAlu :
> 32985546222250: system.switch_cpus T0 : @__do_page_fault+716.32924 :
>  Microcode_ROM : srli   t5, t4, 0x10 : IntAlu :  D=0x000081a08e000010
> 32985546222500: system.switch_cpus T0 : @__do_page_fault+716.32925 :
>  Microcode_ROM : andi   t5, t5, 0xff : IntAlu :  D=0x0000000000000010
> 32985546222750: system.switch_cpus T0 : @__do_page_fault+716.32926 :
>  Microcode_ROM : wrdl   %ctrl140, t3, t5 : IntAlu :  D=0x000000000000abd0
> 32985546226500: system.switch_cpus T0 : @__do_page_fault+716.32927 :
>  Microcode_ROM : limm   t10, 0 : IntAlu :  D=0x0000000000000000
> 32985546226750: system.switch_cpus T0 : @__do_page_fault+716.32928 :
>  Microcode_ROM : rdsel   t10w, t10w, %ctrl127 : IntAlu :
> D=0x0000000000000010
> 32985546227000: system.switch_cpus T0 : @__do_page_fault+716.32929 :
>  Microcode_ROM : wrsel   %ctrl127, t5w,  : IntAlu :  D=0x0000000000000010
> 32985546231500: Page-Fault: RIP 0xffffffff81057b6c: vector 14: #PF(0x3) at
> 0xfffffe0000001fd0
>
> This page fault keeps happening all over again and the execution never
> continues. For some benchmarks it happens not far after restoring the
> checkpoint,
> for others it happens later and for some others it may even never appear.
> I have to also mention that the checkpoint which I restore is taken in a
> reasonable
> time after the benchmark execution start (around 2% of committed
> instructions) using AtomicSimpleCPU. Then I restore with DerivO3CPU or
> another cpu type
> of mine, always derived from DerivO3CPU.
>
> I am sorry for the long email, I tried to be as descriptive and
> comprehensive as possible. I would really appreciate your help because my
> knowledge over gem5
> can not really help me solve this. I am looking forward to hearing from
> anyone having any idea...
> Thank you a lot in advance.
>
> --
> Kleovoulos Kalaitzidis
> Doctorant - Équipe PACAP
>
> Centre de recherche INRIA Rennes - Bretagne Atlantique
> Bâtiment 12E, Bureau E321, Campus de Beaulieu,
> 35042 Rennes Cedex, France
>
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to