Hello, 
I find it relevant to inform also, that this problem has been encountered with 
a fresh pull of gem5 commit: 2045a5c199c7c7597684c5d7501d5fb55aff9608 with no 
extra modifications. The same repo I used it to take the checkpoints uniformly 
for SPEC CPU 2017. The process I follow is that I use the "AtomicSimpleCPU" 
cpu-type to sequentially take the checkpoints. That is, a checkpoint is created 
at instruction "N" , then gem5 exits and then I restore the checkpoint N to 
continue the process and take the next checkpoint at instruction "N+period". 
Hence, this way each checkpoint is tested to be restored and normally work with 
AtomicSimpleCPU. 

Here is the gem5 command for taking the checkpoints : ./build/X86/gem5.opt 
--redirect-stdout --redirect-stderr --outdir=outdir /configs/example/fs.py 
--cpu-type=AtomicSimpleCPU -n 1 --mem-type=DDR4_2400_16x4 --mem-size=8GB 
--fastmem --sys-clock=4GHz --cpu-clock=4GHz 
--kernel=/path_to_kernel/vmlinux-4-15 
--disk-image=/path_to_image/ubuntu-min-16-04.img --checkpoint-dir=/chpts_dir 
--checkpoint-restore=N --at-instruction --take-checkpoints=N+period 
--checkpoint-at-end 

The same process I have done it both for ARM and for x86 using fs.py . However, 
this stack-related problem does not show up for ARM but it does in many of the 
checkpoints taken for the different benchmarks. I can only see the problem 
maybe in the way that I created the equivalent Ubuntu image for x86 ( just 
following Jason's tutorial 
http://www.lowepower.com/jason/setting-up-gem5-full-system.html ) or there 
could be a problem with the way a checkpoint saves the current state when it is 
taken for x86. 

How is the stack saved and restored between the checkpoints (and I assume it is 
used/simulated when restoring with AtomicSimpleCPU)? 

I mention here again the gem5 command I use to restore the checkpoints with 
DerivO3CPU : ./build/X86/gem5.opt -r -e -d /path_to_outdir 
configs/example/fs.py --cpu-type= DerivO3CPU -n 1 --caches --l2cache --l3cache 
--mem-type=DDR4_2400_16x4 --mem-size=8GB --sys-clock=4GHz --cpu-clock=4GHz 
--maxinsts=150000000 --kernel=/path_to_kernel/vmlinux-4-15 
--disk-image=/path_to_image/ubuntu-min-16-04.img 
--checkpoint-dir=/path_to_cpt_dir/ -r N --at-instruction 

As everything has been done with the official gem5 code, even if it is a 
problem of my configuration(image, kernel) or of the SPEC benchmarks, I think 
there should had been a way to detect and stop/exit gem5 when a page fault goes 
into an infinite loop. I would appreciate any feedback regarding this point. 

-- 
Kleovoulos Kalaitzidis 
Doctorant - Équipe PACAP 

Centre de recherche INRIA Rennes - Bretagne Atlantique 
Bâtiment 12E, Bureau E321, Campus de Beaulieu, 
35042 Rennes Cedex, France 

> From: "Kleovoulos Kalaitzidis" <[email protected]>
> To: "gem5 users mailing list" <[email protected]>
> Sent: Tuesday, November 13, 2018 2:31:30 AM
> Subject: Re: [gem5-users] Microcode_ROM page fault not handled

> Hello,
> thank you a lot for your answer Gabe. I see what you mean that the stack seems
> to be bad and I was trying to investigate why.
> In order to have a quick try (and affected by the kernel-related problem of 
> this
> thread I had mentioned :
> https://www.mail-archive.com/[email protected]/msg13058.html )
> I built another kernel version, the 4.8.13 and I restored my checkpoints with
> that one. You can find a part of the output attached here. I see again that 
> the
> Microcode_ROM
> keeps repeating for a page fault at the same address as before :
> 0xfffffe0000001fd0. Though this time it seems to be more specific to me, since
> it is related
> with the kernel function "wake_up_new_task" which I found to be called at the
> "do_fork" one. I can not really understand why stack does not play well with
> some of the benchmarks, since I use the same way to take my checkpoints and
> then restore them respectively. If this different output in comparison with 
> the
> previous one can give an idea to someone please let me know. Thank you a lot
> for your help.
> --
> Kleovoulos Kalaitzidis
> Doctorant - Équipe PACAP

> Centre de recherche INRIA Rennes - Bretagne Atlantique
> Bâtiment 12E, Bureau E321, Campus de Beaulieu,
> 35042 Rennes Cedex, France

>> From: "Gabe Black" <[email protected]>
>> To: "gem5 users mailing list" <[email protected]>
>> Sent: Monday, November 12, 2018 10:48:19 PM
>> Subject: Re: [gem5-users] Microcode_ROM page fault not handled

>> The microcode that's executing is in src/arch/x86/isa/insts/romutil.py I 
>> think,
>> and it looks like your stack is bad. That's where the vectoring microcode
>> checks to see that it will be able to write out the interrupt stack frame, 
>> and
>> it apparently can't. That triggers another page fault, and it has the same
>> problem. You'll need to determine why your stack ends up out of whack, or why
>> that code might not be handling the stack in an exactly correct way which 
>> makes
>> it fault when it shouldn't.
>> Gabe

>> On Mon, Nov 12, 2018 at 8:25 AM Kleovoulos Kalaitzidis < [
>> mailto:[email protected] | [email protected] ] >
>> wrote:

>>> Hello,
>>> just to give more detail, I have attached here a part of the simout file 
>>> before
>>> the first appearance of the page fault that after keeps
>>> executing.

>>> --
>>> Kleovoulos Kalaitzidis
>>> Doctorant - Équipe PACAP

>>> Centre de recherche INRIA Rennes - Bretagne Atlantique
>>> Bâtiment 12E, Bureau E321, Campus de Beaulieu,
>>> 35042 Rennes Cedex, France

>>>> From: "Kleovoulos Kalaitzidis" < [ mailto:[email protected] |
>>>> [email protected] ] >
>>>> To: "gem5 users mailing list" < [ mailto:[email protected] |
>>>> [email protected] ] >
>>>> Sent: Monday, November 12, 2018 4:09:56 PM
>>>> Subject: [gem5-users] Microcode_ROM page fault not handled

>>>> Hello everyone,

>>>> I am currently using FS mode to simulate and execute SPEC benchmarks. The 
>>>> image
>>>> I use is an Ubuntu-16.04 and the kernel I built for that is vmlinux-4-15.
>>>> To settle up the FS simulation environment, create the image file and 
>>>> build the
>>>> kernel I have followed Jason's instructions from here: [
>>>> http://www.lowepower.com/jason/setting-up-gem5-full-system.html |
>>>> http://www.lowepower.com/jason/setting-up-gem5-full-system.html ]
>>>> I run my simulations with x86 and I have already taken some checkpoints 
>>>> for FS,
>>>> so now I use them to restore and execute the benchmarks. However, after 
>>>> some
>>>> testing
>>>> I found out that most of them after some time following the restore they 
>>>> execute
>>>> infinite loops of micro ops without proceeding in the total benchmark
>>>> execution, because the number of executed instructions
>>>> would not change (after some printing within execution)

>>>> The gem5 command to restore first checkpoint is here : /build/X86/gem5.opt
>>>> --redirect-stdout --redirect-stderr --outdir=/outdir /configs/example/fs.py
>>>> --cpu-type=DerivO3CPU -n 1 --caches --l2cache --mem-type=DDR4_2400_16x4
>>>> --mem-size=8GB --sys-clock=4GHz --cpu-clock=4GHz
>>>> --kernel=/path_to_kernel/vmlinux-4-15
>>>> --disk-image=/path_to_image/ubuntu-min-16-04.img
>>>> --checkpoint-dir=/path_to_checkpoint_dir/ -r 1

>>>> To tackle the problem I found the aforementioned recurring loop of micro 
>>>> ops and
>>>> I saw that it keeps executing micro ops related with instruction 
>>>> Microcode_ROM
>>>> After some search I found this older thread where someone else had a quite
>>>> similar problem : [
>>>> https://www.mail-archive.com/[email protected]/msg13058.html |
>>>> https://www.mail-archive.com/[email protected]/msg13058.html ]

>>>> So I followed same pattern, I used the --debug-flags=Exec,LocalApic,Faults 
>>>> and I
>>>> get this output :

>>>> 32985546164250: system.switch_cpus T0 : @__do_page_fault+716.32930 :
>>>> Microcode_ROM : ldst t0, HS:[t6] : MemRead : A=0xfffffe0000001fd0
>>>> 32985546172500: system.switch_cpus T0 : @__do_page_fault+716.32890 :
>>>> Microcode_ROM : slli t4, t1, 0x4 : IntAlu : D=0x00000000000000e0
>>>> 32985546172750: system.switch_cpus T0 : @__do_page_fault+716.32891 :
>>>> Microcode_ROM : ld t2, IDTR:[t4 + 0x8] : MemRead : D=0x00000000ffffffff
>>>> A=0xfffffe00000000e8
>>>> 32985546173000: system.switch_cpus T0 : @__do_page_fault+716.32892 :
>>>> Microcode_ROM : ld t4, IDTR:[t4] : MemRead : D=0x81a08e00001015d0
>>>> A=0xfffffe00000000e0
>>>> 32985546173250: system.switch_cpus T0 : @__do_page_fault+716.32893 :
>>>> Microcode_ROM : chks , t4b, 0x3 : IntAlu :
>>>> 32985546173500: system.switch_cpus T0 : @__do_page_fault+716.32894 :
>>>> Microcode_ROM : srli t10, t4, 0x10 : IntAlu : D=0x000081a08e000010
>>>> 32985546173750: system.switch_cpus T0 : @__do_page_fault+716.32895 :
>>>> Microcode_ROM : andi t5, t10, 0xf8 : IntAlu : D=0x0000000000000010
>>>> 32985546174000: system.switch_cpus T0 : @__do_page_fault+716.32896 :
>>>> Microcode_ROM : andi t0w, t10w, 0x4 : IntAlu : D=0x0000000000000020
>>>> 32985546174250: system.switch_cpus T0 : @__do_page_fault+716.32897 :
>>>> Microcode_ROM : br 0x8084 : No_OpClass :
>>>> 32985546176500: system.switch_cpus T0 : @__do_page_fault+716.32900 :
>>>> Microcode_ROM : ld t3, TSG:[t5] : MemRead : D=0x00af9b000000ffff
>>>> A=0xfffffe0000001010
>>>> 32985546176750: system.switch_cpus T0 : @__do_page_fault+716.32901 :
>>>> Microcode_ROM : chks , t3, 0x7 : IntAlu :
>>>> 32985546177000: system.switch_cpus T0 : @__do_page_fault+716.32902 :
>>>> Microcode_ROM : wrdl %ctrl145, t3, t10 : IntAlu : D=0x000000000000abd0
>>>> 32985546177250: system.switch_cpus T0 : @__do_page_fault+716.32903 :
>>>> Microcode_ROM : wrdh t9, t4, t2 : IntAlu : D=0xffffffff81a015d0
>>>> 32985546177500: system.switch_cpus T0 : @__do_page_fault+716.32904 :
>>>> Microcode_ROM : rdsel t11b, t11b, %ctrl128 : IntAlu : D=0x0000000000000000
>>>> 32985546177750: system.switch_cpus T0 : @__do_page_fault+716.32905 :
>>>> Microcode_ROM : rdattr t10, %ctrl184, : IntAlu : D=0x000000000000abd0
>>>> 32985546178000: system.switch_cpus T0 : @__do_page_fault+716.32906 :
>>>> Microcode_ROM : andi t10, t10, 0x3 : IntAlu : D=0x0000000000000000
>>>> 32985546178250: system.switch_cpus T0 : @__do_page_fault+716.32907 :
>>>> Microcode_ROM : rdattr t5, %ctrl179, : IntAlu : D=0x000000000000abd0
>>>> 32985546178500: system.switch_cpus T0 : @__do_page_fault+716.32908 :
>>>> Microcode_ROM : andi t5, t5, 0x3 : IntAlu : D=0x0000000000000000
>>>> 32985546178750: system.switch_cpus T0 : @__do_page_fault+716.32909 :
>>>> Microcode_ROM : sub t0, t5, t10 : IntAlu : D=0x0000000000000020
>>>> 32985546179000: system.switch_cpus T0 : @__do_page_fault+716.32910 :
>>>> Microcode_ROM : mov t11b, t0b, t0b : IntAlu : D=0x0000000000000000
>>>> 32985546179250: system.switch_cpus T0 : @__do_page_fault+716.32911 :
>>>> Microcode_ROM : srli t12, t4, 0x20 : IntAlu : D=0x0000000081a08e00
>>>> 32985546179500: system.switch_cpus T0 : @__do_page_fault+716.32912 :
>>>> Microcode_ROM : andi t12, t12, 0x7 : IntAlu : D=0x0000000000000000
>>>> 32985546179750: system.switch_cpus T0 : @__do_page_fault+716.32913 :
>>>> Microcode_ROM : subi t0, t12, 0x1 : IntAlu : D=0x0000000000000008
>>>> 32985546180000: system.switch_cpus T0 : @__do_page_fault+716.32914 :
>>>> Microcode_ROM : br 0x8096 : No_OpClass :
>>>> 32985546215500: system.switch_cpus T0 : @__do_page_fault+716.32915 :
>>>> Microcode_ROM : br 0x8098 : No_OpClass :
>>>> 32985546217500: system.switch_cpus T0 : @__do_page_fault+716.32916 :
>>>> Microcode_ROM : mov t6, t6, rsp : IntAlu : D=0xfffffe0000002000
>>>> 32985546217750: system.switch_cpus T0 : @__do_page_fault+716.32917 :
>>>> Microcode_ROM : br 0x8099 : No_OpClass :
>>>> 32985546219750: system.switch_cpus T0 : @__do_page_fault+716.32921 :
>>>> Microcode_ROM : andi t6b, t6b, 0xf0 : IntAlu : D=0xfffffe0000002000
>>>> 32985546220000: system.switch_cpus T0 : @__do_page_fault+716.32922 :
>>>> Microcode_ROM : subi t6, t6, 0x30 : IntAlu : D=0xfffffe0000001fd0
>>>> 32985546220250: system.switch_cpus T0 : @__do_page_fault+716.32923 :
>>>> Microcode_ROM : wrip , t0, t9 : IntAlu :
>>>> 32985546222250: system.switch_cpus T0 : @__do_page_fault+716.32924 :
>>>> Microcode_ROM : srli t5, t4, 0x10 : IntAlu : D=0x000081a08e000010
>>>> 32985546222500: system.switch_cpus T0 : @__do_page_fault+716.32925 :
>>>> Microcode_ROM : andi t5, t5, 0xff : IntAlu : D=0x0000000000000010
>>>> 32985546222750: system.switch_cpus T0 : @__do_page_fault+716.32926 :
>>>> Microcode_ROM : wrdl %ctrl140, t3, t5 : IntAlu : D=0x000000000000abd0
>>>> 32985546226500: system.switch_cpus T0 : @__do_page_fault+716.32927 :
>>>> Microcode_ROM : limm t10, 0 : IntAlu : D=0x0000000000000000
>>>> 32985546226750: system.switch_cpus T0 : @__do_page_fault+716.32928 :
>>>> Microcode_ROM : rdsel t10w, t10w, %ctrl127 : IntAlu : D=0x0000000000000010
>>>> 32985546227000: system.switch_cpus T0 : @__do_page_fault+716.32929 :
>>>> Microcode_ROM : wrsel %ctrl127, t5w, : IntAlu : D=0x0000000000000010
>>>> 32985546231500: Page-Fault: RIP 0xffffffff81057b6c: vector 14: #PF(0x3) at
>>>> 0xfffffe0000001fd0

>>>> This page fault keeps happening all over again and the execution never
>>>> continues. For some benchmarks it happens not far after restoring the
>>>> checkpoint,
>>>> for others it happens later and for some others it may even never appear. 
>>>> I have
>>>> to also mention that the checkpoint which I restore is taken in a 
>>>> reasonable
>>>> time after the benchmark execution start (around 2% of committed 
>>>> instructions)
>>>> using AtomicSimpleCPU. Then I restore with DerivO3CPU or another cpu type
>>>> of mine, always derived from DerivO3CPU.

>>>> I am sorry for the long email, I tried to be as descriptive and 
>>>> comprehensive as
>>>> possible. I would really appreciate your help because my knowledge over 
>>>> gem5
>>>> can not really help me solve this. I am looking forward to hearing from 
>>>> anyone
>>>> having any idea...
>>>> Thank you a lot in advance.

>>>> --
>>>> Kleovoulos Kalaitzidis
>>>> Doctorant - Équipe PACAP

>>>> Centre de recherche INRIA Rennes - Bretagne Atlantique
>>>> Bâtiment 12E, Bureau E321, Campus de Beaulieu,
>>>> 35042 Rennes Cedex, France

>>>> _______________________________________________
>>>> gem5-users mailing list
>>>> [ mailto:[email protected] | [email protected] ]
>>>> [ http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users |
>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users ]

>>> _______________________________________________
>>> gem5-users mailing list
>>> [ mailto:[email protected] | [email protected] ]
>>> [ http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users |
>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users ]
>> _______________________________________________
>> gem5-users mailing list
>> [email protected]
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to