Re: [gem5-users] Microcode_ROM page fault not handled

Kleovoulos Kalaitzidis Mon, 12 Nov 2018 17:31:57 -0800

Hello, 
thank you a lot for your answer Gabe. I see what you mean that the stack seems 
to be bad and I was trying to investigate why. 
In order to have a quick try (and affected by the kernel-related problem of 
this thread I had mentioned : 
https://www.mail-archive.com/[email protected]/msg13058.html ) 
I built another kernel version, the 4.8.13 and I restored my checkpoints with 
that one. You can find a part of the output attached here. I see again that the 
Microcode_ROM 
keeps repeating for a page fault at the same address as before : 
0xfffffe0000001fd0. Though this time it seems to be more specific to me, since 
it is related 
with the kernel function "wake_up_new_task" which I found to be called at the 
"do_fork" one. I can not really understand why stack does not play well with 
some of the benchmarks, since I use the same way to take my checkpoints and 
then restore them respectively. If this different output in comparison with the 
previous one can give an idea to someone please let me know. Thank you a lot 
for your help. 
-- 
Kleovoulos Kalaitzidis 
Doctorant - Équipe PACAP


Centre de recherche INRIA Rennes - Bretagne Atlantique 
Bâtiment 12E, Bureau E321, Campus de Beaulieu, 
35042 Rennes Cedex, France 

> From: "Gabe Black" <[email protected]>
> To: "gem5 users mailing list" <[email protected]>
> Sent: Monday, November 12, 2018 10:48:19 PM
> Subject: Re: [gem5-users] Microcode_ROM page fault not handled

> The microcode that's executing is in src/arch/x86/isa/insts/romutil.py I 
> think,
> and it looks like your stack is bad. That's where the vectoring microcode
> checks to see that it will be able to write out the interrupt stack frame, and
> it apparently can't. That triggers another page fault, and it has the same
> problem. You'll need to determine why your stack ends up out of whack, or why
> that code might not be handling the stack in an exactly correct way which 
> makes
> it fault when it shouldn't.
> Gabe

> On Mon, Nov 12, 2018 at 8:25 AM Kleovoulos Kalaitzidis < [
> mailto:[email protected] | [email protected] ] >
> wrote:

>> Hello,
>> just to give more detail, I have attached here a part of the simout file 
>> before
>> the first appearance of the page fault that after keeps
>> executing.

>> --
>> Kleovoulos Kalaitzidis
>> Doctorant - Équipe PACAP

>> Centre de recherche INRIA Rennes - Bretagne Atlantique
>> Bâtiment 12E, Bureau E321, Campus de Beaulieu,
>> 35042 Rennes Cedex, France

>>> From: "Kleovoulos Kalaitzidis" < [ mailto:[email protected] |
>>> [email protected] ] >
>>> To: "gem5 users mailing list" < [ mailto:[email protected] |
>>> [email protected] ] >
>>> Sent: Monday, November 12, 2018 4:09:56 PM
>>> Subject: [gem5-users] Microcode_ROM page fault not handled

>>> Hello everyone,

>>> I am currently using FS mode to simulate and execute SPEC benchmarks. The 
>>> image
>>> I use is an Ubuntu-16.04 and the kernel I built for that is vmlinux-4-15.
>>> To settle up the FS simulation environment, create the image file and build 
>>> the
>>> kernel I have followed Jason's instructions from here: [
>>> http://www.lowepower.com/jason/setting-up-gem5-full-system.html |
>>> http://www.lowepower.com/jason/setting-up-gem5-full-system.html ]
>>> I run my simulations with x86 and I have already taken some checkpoints for 
>>> FS,
>>> so now I use them to restore and execute the benchmarks. However, after some
>>> testing
>>> I found out that most of them after some time following the restore they 
>>> execute
>>> infinite loops of micro ops without proceeding in the total benchmark
>>> execution, because the number of executed instructions
>>> would not change (after some printing within execution)

>>> The gem5 command to restore first checkpoint is here : /build/X86/gem5.opt
>>> --redirect-stdout --redirect-stderr --outdir=/outdir /configs/example/fs.py
>>> --cpu-type=DerivO3CPU -n 1 --caches --l2cache --mem-type=DDR4_2400_16x4
>>> --mem-size=8GB --sys-clock=4GHz --cpu-clock=4GHz
>>> --kernel=/path_to_kernel/vmlinux-4-15
>>> --disk-image=/path_to_image/ubuntu-min-16-04.img
>>> --checkpoint-dir=/path_to_checkpoint_dir/ -r 1

>>> To tackle the problem I found the aforementioned recurring loop of micro 
>>> ops and
>>> I saw that it keeps executing micro ops related with instruction 
>>> Microcode_ROM
>>> After some search I found this older thread where someone else had a quite
>>> similar problem : [
>>> https://www.mail-archive.com/[email protected]/msg13058.html |
>>> https://www.mail-archive.com/[email protected]/msg13058.html ]

>>> So I followed same pattern, I used the --debug-flags=Exec,LocalApic,Faults 
>>> and I
>>> get this output :

>>> 32985546164250: system.switch_cpus T0 : @__do_page_fault+716.32930 :
>>> Microcode_ROM : ldst t0, HS:[t6] : MemRead : A=0xfffffe0000001fd0
>>> 32985546172500: system.switch_cpus T0 : @__do_page_fault+716.32890 :
>>> Microcode_ROM : slli t4, t1, 0x4 : IntAlu : D=0x00000000000000e0
>>> 32985546172750: system.switch_cpus T0 : @__do_page_fault+716.32891 :
>>> Microcode_ROM : ld t2, IDTR:[t4 + 0x8] : MemRead : D=0x00000000ffffffff
>>> A=0xfffffe00000000e8
>>> 32985546173000: system.switch_cpus T0 : @__do_page_fault+716.32892 :
>>> Microcode_ROM : ld t4, IDTR:[t4] : MemRead : D=0x81a08e00001015d0
>>> A=0xfffffe00000000e0
>>> 32985546173250: system.switch_cpus T0 : @__do_page_fault+716.32893 :
>>> Microcode_ROM : chks , t4b, 0x3 : IntAlu :
>>> 32985546173500: system.switch_cpus T0 : @__do_page_fault+716.32894 :
>>> Microcode_ROM : srli t10, t4, 0x10 : IntAlu : D=0x000081a08e000010
>>> 32985546173750: system.switch_cpus T0 : @__do_page_fault+716.32895 :
>>> Microcode_ROM : andi t5, t10, 0xf8 : IntAlu : D=0x0000000000000010
>>> 32985546174000: system.switch_cpus T0 : @__do_page_fault+716.32896 :
>>> Microcode_ROM : andi t0w, t10w, 0x4 : IntAlu : D=0x0000000000000020
>>> 32985546174250: system.switch_cpus T0 : @__do_page_fault+716.32897 :
>>> Microcode_ROM : br 0x8084 : No_OpClass :
>>> 32985546176500: system.switch_cpus T0 : @__do_page_fault+716.32900 :
>>> Microcode_ROM : ld t3, TSG:[t5] : MemRead : D=0x00af9b000000ffff
>>> A=0xfffffe0000001010
>>> 32985546176750: system.switch_cpus T0 : @__do_page_fault+716.32901 :
>>> Microcode_ROM : chks , t3, 0x7 : IntAlu :
>>> 32985546177000: system.switch_cpus T0 : @__do_page_fault+716.32902 :
>>> Microcode_ROM : wrdl %ctrl145, t3, t10 : IntAlu : D=0x000000000000abd0
>>> 32985546177250: system.switch_cpus T0 : @__do_page_fault+716.32903 :
>>> Microcode_ROM : wrdh t9, t4, t2 : IntAlu : D=0xffffffff81a015d0
>>> 32985546177500: system.switch_cpus T0 : @__do_page_fault+716.32904 :
>>> Microcode_ROM : rdsel t11b, t11b, %ctrl128 : IntAlu : D=0x0000000000000000
>>> 32985546177750: system.switch_cpus T0 : @__do_page_fault+716.32905 :
>>> Microcode_ROM : rdattr t10, %ctrl184, : IntAlu : D=0x000000000000abd0
>>> 32985546178000: system.switch_cpus T0 : @__do_page_fault+716.32906 :
>>> Microcode_ROM : andi t10, t10, 0x3 : IntAlu : D=0x0000000000000000
>>> 32985546178250: system.switch_cpus T0 : @__do_page_fault+716.32907 :
>>> Microcode_ROM : rdattr t5, %ctrl179, : IntAlu : D=0x000000000000abd0
>>> 32985546178500: system.switch_cpus T0 : @__do_page_fault+716.32908 :
>>> Microcode_ROM : andi t5, t5, 0x3 : IntAlu : D=0x0000000000000000
>>> 32985546178750: system.switch_cpus T0 : @__do_page_fault+716.32909 :
>>> Microcode_ROM : sub t0, t5, t10 : IntAlu : D=0x0000000000000020
>>> 32985546179000: system.switch_cpus T0 : @__do_page_fault+716.32910 :
>>> Microcode_ROM : mov t11b, t0b, t0b : IntAlu : D=0x0000000000000000
>>> 32985546179250: system.switch_cpus T0 : @__do_page_fault+716.32911 :
>>> Microcode_ROM : srli t12, t4, 0x20 : IntAlu : D=0x0000000081a08e00
>>> 32985546179500: system.switch_cpus T0 : @__do_page_fault+716.32912 :
>>> Microcode_ROM : andi t12, t12, 0x7 : IntAlu : D=0x0000000000000000
>>> 32985546179750: system.switch_cpus T0 : @__do_page_fault+716.32913 :
>>> Microcode_ROM : subi t0, t12, 0x1 : IntAlu : D=0x0000000000000008
>>> 32985546180000: system.switch_cpus T0 : @__do_page_fault+716.32914 :
>>> Microcode_ROM : br 0x8096 : No_OpClass :
>>> 32985546215500: system.switch_cpus T0 : @__do_page_fault+716.32915 :
>>> Microcode_ROM : br 0x8098 : No_OpClass :
>>> 32985546217500: system.switch_cpus T0 : @__do_page_fault+716.32916 :
>>> Microcode_ROM : mov t6, t6, rsp : IntAlu : D=0xfffffe0000002000
>>> 32985546217750: system.switch_cpus T0 : @__do_page_fault+716.32917 :
>>> Microcode_ROM : br 0x8099 : No_OpClass :
>>> 32985546219750: system.switch_cpus T0 : @__do_page_fault+716.32921 :
>>> Microcode_ROM : andi t6b, t6b, 0xf0 : IntAlu : D=0xfffffe0000002000
>>> 32985546220000: system.switch_cpus T0 : @__do_page_fault+716.32922 :
>>> Microcode_ROM : subi t6, t6, 0x30 : IntAlu : D=0xfffffe0000001fd0
>>> 32985546220250: system.switch_cpus T0 : @__do_page_fault+716.32923 :
>>> Microcode_ROM : wrip , t0, t9 : IntAlu :
>>> 32985546222250: system.switch_cpus T0 : @__do_page_fault+716.32924 :
>>> Microcode_ROM : srli t5, t4, 0x10 : IntAlu : D=0x000081a08e000010
>>> 32985546222500: system.switch_cpus T0 : @__do_page_fault+716.32925 :
>>> Microcode_ROM : andi t5, t5, 0xff : IntAlu : D=0x0000000000000010
>>> 32985546222750: system.switch_cpus T0 : @__do_page_fault+716.32926 :
>>> Microcode_ROM : wrdl %ctrl140, t3, t5 : IntAlu : D=0x000000000000abd0
>>> 32985546226500: system.switch_cpus T0 : @__do_page_fault+716.32927 :
>>> Microcode_ROM : limm t10, 0 : IntAlu : D=0x0000000000000000
>>> 32985546226750: system.switch_cpus T0 : @__do_page_fault+716.32928 :
>>> Microcode_ROM : rdsel t10w, t10w, %ctrl127 : IntAlu : D=0x0000000000000010
>>> 32985546227000: system.switch_cpus T0 : @__do_page_fault+716.32929 :
>>> Microcode_ROM : wrsel %ctrl127, t5w, : IntAlu : D=0x0000000000000010
>>> 32985546231500: Page-Fault: RIP 0xffffffff81057b6c: vector 14: #PF(0x3) at
>>> 0xfffffe0000001fd0

>>> This page fault keeps happening all over again and the execution never
>>> continues. For some benchmarks it happens not far after restoring the
>>> checkpoint,
>>> for others it happens later and for some others it may even never appear. I 
>>> have
>>> to also mention that the checkpoint which I restore is taken in a reasonable
>>> time after the benchmark execution start (around 2% of committed 
>>> instructions)
>>> using AtomicSimpleCPU. Then I restore with DerivO3CPU or another cpu type
>>> of mine, always derived from DerivO3CPU.

>>> I am sorry for the long email, I tried to be as descriptive and 
>>> comprehensive as
>>> possible. I would really appreciate your help because my knowledge over gem5
>>> can not really help me solve this. I am looking forward to hearing from 
>>> anyone
>>> having any idea...
>>> Thank you a lot in advance.

>>> --
>>> Kleovoulos Kalaitzidis
>>> Doctorant - Équipe PACAP

>>> Centre de recherche INRIA Rennes - Bretagne Atlantique
>>> Bâtiment 12E, Bureau E321, Campus de Beaulieu,
>>> 35042 Rennes Cedex, France

>>> _______________________________________________
>>> gem5-users mailing list
>>> [ mailto:[email protected] | [email protected] ]
>>> [ http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users |
>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users ]

>> _______________________________________________
>> gem5-users mailing list
>> [ mailto:[email protected] | [email protected] ]
>> [ http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users |
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users ]
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

simout
Description: Binary data

_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Microcode_ROM page fault not handled

Reply via email to