Hello, I find it relevant to inform also, that this problem has been encountered with a fresh pull of gem5 commit: 2045a5c199c7c7597684c5d7501d5fb55aff9608 with no extra modifications. The same repo I used it to take the checkpoints uniformly for SPEC CPU 2017. The process I follow is that I use the "AtomicSimpleCPU" cpu-type to sequentially take the checkpoints. That is, a checkpoint is created at instruction "N" , then gem5 exits and then I restore the checkpoint N to continue the process and take the next checkpoint at instruction "N+period". Hence, this way each checkpoint is tested to be restored and normally work with AtomicSimpleCPU.
Here is the gem5 command for taking the checkpoints : ./build/X86/gem5.opt --redirect-stdout --redirect-stderr --outdir=outdir /configs/example/fs.py --cpu-type=AtomicSimpleCPU -n 1 --mem-type=DDR4_2400_16x4 --mem-size=8GB --fastmem --sys-clock=4GHz --cpu-clock=4GHz --kernel=/path_to_kernel/vmlinux-4-15 --disk-image=/path_to_image/ubuntu-min-16-04.img --checkpoint-dir=/chpts_dir --checkpoint-restore=N --at-instruction --take-checkpoints=N+period --checkpoint-at-end The same process I have done it both for ARM and for x86 using fs.py . However, this stack-related problem does not show up for ARM but it does in many of the checkpoints taken for the different benchmarks. I can only see the problem maybe in the way that I created the equivalent Ubuntu image for x86 ( just following Jason's tutorial http://www.lowepower.com/jason/setting-up-gem5-full-system.html ) or there could be a problem with the way a checkpoint saves the current state when it is taken for x86. How is the stack saved and restored between the checkpoints (and I assume it is used/simulated when restoring with AtomicSimpleCPU)? I mention here again the gem5 command I use to restore the checkpoints with DerivO3CPU : ./build/X86/gem5.opt -r -e -d /path_to_outdir configs/example/fs.py --cpu-type= DerivO3CPU -n 1 --caches --l2cache --l3cache --mem-type=DDR4_2400_16x4 --mem-size=8GB --sys-clock=4GHz --cpu-clock=4GHz --maxinsts=150000000 --kernel=/path_to_kernel/vmlinux-4-15 --disk-image=/path_to_image/ubuntu-min-16-04.img --checkpoint-dir=/path_to_cpt_dir/ -r N --at-instruction As everything has been done with the official gem5 code, even if it is a problem of my configuration(image, kernel) or of the SPEC benchmarks, I think there should had been a way to detect and stop/exit gem5 when a page fault goes into an infinite loop. I would appreciate any feedback regarding this point. -- Kleovoulos Kalaitzidis Doctorant - Équipe PACAP Centre de recherche INRIA Rennes - Bretagne Atlantique Bâtiment 12E, Bureau E321, Campus de Beaulieu, 35042 Rennes Cedex, France > From: "Kleovoulos Kalaitzidis" <[email protected]> > To: "gem5 users mailing list" <[email protected]> > Sent: Tuesday, November 13, 2018 2:31:30 AM > Subject: Re: [gem5-users] Microcode_ROM page fault not handled > Hello, > thank you a lot for your answer Gabe. I see what you mean that the stack seems > to be bad and I was trying to investigate why. > In order to have a quick try (and affected by the kernel-related problem of > this > thread I had mentioned : > https://www.mail-archive.com/[email protected]/msg13058.html ) > I built another kernel version, the 4.8.13 and I restored my checkpoints with > that one. You can find a part of the output attached here. I see again that > the > Microcode_ROM > keeps repeating for a page fault at the same address as before : > 0xfffffe0000001fd0. Though this time it seems to be more specific to me, since > it is related > with the kernel function "wake_up_new_task" which I found to be called at the > "do_fork" one. I can not really understand why stack does not play well with > some of the benchmarks, since I use the same way to take my checkpoints and > then restore them respectively. If this different output in comparison with > the > previous one can give an idea to someone please let me know. Thank you a lot > for your help. > -- > Kleovoulos Kalaitzidis > Doctorant - Équipe PACAP > Centre de recherche INRIA Rennes - Bretagne Atlantique > Bâtiment 12E, Bureau E321, Campus de Beaulieu, > 35042 Rennes Cedex, France >> From: "Gabe Black" <[email protected]> >> To: "gem5 users mailing list" <[email protected]> >> Sent: Monday, November 12, 2018 10:48:19 PM >> Subject: Re: [gem5-users] Microcode_ROM page fault not handled >> The microcode that's executing is in src/arch/x86/isa/insts/romutil.py I >> think, >> and it looks like your stack is bad. That's where the vectoring microcode >> checks to see that it will be able to write out the interrupt stack frame, >> and >> it apparently can't. That triggers another page fault, and it has the same >> problem. You'll need to determine why your stack ends up out of whack, or why >> that code might not be handling the stack in an exactly correct way which >> makes >> it fault when it shouldn't. >> Gabe >> On Mon, Nov 12, 2018 at 8:25 AM Kleovoulos Kalaitzidis < [ >> mailto:[email protected] | [email protected] ] > >> wrote: >>> Hello, >>> just to give more detail, I have attached here a part of the simout file >>> before >>> the first appearance of the page fault that after keeps >>> executing. >>> -- >>> Kleovoulos Kalaitzidis >>> Doctorant - Équipe PACAP >>> Centre de recherche INRIA Rennes - Bretagne Atlantique >>> Bâtiment 12E, Bureau E321, Campus de Beaulieu, >>> 35042 Rennes Cedex, France >>>> From: "Kleovoulos Kalaitzidis" < [ mailto:[email protected] | >>>> [email protected] ] > >>>> To: "gem5 users mailing list" < [ mailto:[email protected] | >>>> [email protected] ] > >>>> Sent: Monday, November 12, 2018 4:09:56 PM >>>> Subject: [gem5-users] Microcode_ROM page fault not handled >>>> Hello everyone, >>>> I am currently using FS mode to simulate and execute SPEC benchmarks. The >>>> image >>>> I use is an Ubuntu-16.04 and the kernel I built for that is vmlinux-4-15. >>>> To settle up the FS simulation environment, create the image file and >>>> build the >>>> kernel I have followed Jason's instructions from here: [ >>>> http://www.lowepower.com/jason/setting-up-gem5-full-system.html | >>>> http://www.lowepower.com/jason/setting-up-gem5-full-system.html ] >>>> I run my simulations with x86 and I have already taken some checkpoints >>>> for FS, >>>> so now I use them to restore and execute the benchmarks. However, after >>>> some >>>> testing >>>> I found out that most of them after some time following the restore they >>>> execute >>>> infinite loops of micro ops without proceeding in the total benchmark >>>> execution, because the number of executed instructions >>>> would not change (after some printing within execution) >>>> The gem5 command to restore first checkpoint is here : /build/X86/gem5.opt >>>> --redirect-stdout --redirect-stderr --outdir=/outdir /configs/example/fs.py >>>> --cpu-type=DerivO3CPU -n 1 --caches --l2cache --mem-type=DDR4_2400_16x4 >>>> --mem-size=8GB --sys-clock=4GHz --cpu-clock=4GHz >>>> --kernel=/path_to_kernel/vmlinux-4-15 >>>> --disk-image=/path_to_image/ubuntu-min-16-04.img >>>> --checkpoint-dir=/path_to_checkpoint_dir/ -r 1 >>>> To tackle the problem I found the aforementioned recurring loop of micro >>>> ops and >>>> I saw that it keeps executing micro ops related with instruction >>>> Microcode_ROM >>>> After some search I found this older thread where someone else had a quite >>>> similar problem : [ >>>> https://www.mail-archive.com/[email protected]/msg13058.html | >>>> https://www.mail-archive.com/[email protected]/msg13058.html ] >>>> So I followed same pattern, I used the --debug-flags=Exec,LocalApic,Faults >>>> and I >>>> get this output : >>>> 32985546164250: system.switch_cpus T0 : @__do_page_fault+716.32930 : >>>> Microcode_ROM : ldst t0, HS:[t6] : MemRead : A=0xfffffe0000001fd0 >>>> 32985546172500: system.switch_cpus T0 : @__do_page_fault+716.32890 : >>>> Microcode_ROM : slli t4, t1, 0x4 : IntAlu : D=0x00000000000000e0 >>>> 32985546172750: system.switch_cpus T0 : @__do_page_fault+716.32891 : >>>> Microcode_ROM : ld t2, IDTR:[t4 + 0x8] : MemRead : D=0x00000000ffffffff >>>> A=0xfffffe00000000e8 >>>> 32985546173000: system.switch_cpus T0 : @__do_page_fault+716.32892 : >>>> Microcode_ROM : ld t4, IDTR:[t4] : MemRead : D=0x81a08e00001015d0 >>>> A=0xfffffe00000000e0 >>>> 32985546173250: system.switch_cpus T0 : @__do_page_fault+716.32893 : >>>> Microcode_ROM : chks , t4b, 0x3 : IntAlu : >>>> 32985546173500: system.switch_cpus T0 : @__do_page_fault+716.32894 : >>>> Microcode_ROM : srli t10, t4, 0x10 : IntAlu : D=0x000081a08e000010 >>>> 32985546173750: system.switch_cpus T0 : @__do_page_fault+716.32895 : >>>> Microcode_ROM : andi t5, t10, 0xf8 : IntAlu : D=0x0000000000000010 >>>> 32985546174000: system.switch_cpus T0 : @__do_page_fault+716.32896 : >>>> Microcode_ROM : andi t0w, t10w, 0x4 : IntAlu : D=0x0000000000000020 >>>> 32985546174250: system.switch_cpus T0 : @__do_page_fault+716.32897 : >>>> Microcode_ROM : br 0x8084 : No_OpClass : >>>> 32985546176500: system.switch_cpus T0 : @__do_page_fault+716.32900 : >>>> Microcode_ROM : ld t3, TSG:[t5] : MemRead : D=0x00af9b000000ffff >>>> A=0xfffffe0000001010 >>>> 32985546176750: system.switch_cpus T0 : @__do_page_fault+716.32901 : >>>> Microcode_ROM : chks , t3, 0x7 : IntAlu : >>>> 32985546177000: system.switch_cpus T0 : @__do_page_fault+716.32902 : >>>> Microcode_ROM : wrdl %ctrl145, t3, t10 : IntAlu : D=0x000000000000abd0 >>>> 32985546177250: system.switch_cpus T0 : @__do_page_fault+716.32903 : >>>> Microcode_ROM : wrdh t9, t4, t2 : IntAlu : D=0xffffffff81a015d0 >>>> 32985546177500: system.switch_cpus T0 : @__do_page_fault+716.32904 : >>>> Microcode_ROM : rdsel t11b, t11b, %ctrl128 : IntAlu : D=0x0000000000000000 >>>> 32985546177750: system.switch_cpus T0 : @__do_page_fault+716.32905 : >>>> Microcode_ROM : rdattr t10, %ctrl184, : IntAlu : D=0x000000000000abd0 >>>> 32985546178000: system.switch_cpus T0 : @__do_page_fault+716.32906 : >>>> Microcode_ROM : andi t10, t10, 0x3 : IntAlu : D=0x0000000000000000 >>>> 32985546178250: system.switch_cpus T0 : @__do_page_fault+716.32907 : >>>> Microcode_ROM : rdattr t5, %ctrl179, : IntAlu : D=0x000000000000abd0 >>>> 32985546178500: system.switch_cpus T0 : @__do_page_fault+716.32908 : >>>> Microcode_ROM : andi t5, t5, 0x3 : IntAlu : D=0x0000000000000000 >>>> 32985546178750: system.switch_cpus T0 : @__do_page_fault+716.32909 : >>>> Microcode_ROM : sub t0, t5, t10 : IntAlu : D=0x0000000000000020 >>>> 32985546179000: system.switch_cpus T0 : @__do_page_fault+716.32910 : >>>> Microcode_ROM : mov t11b, t0b, t0b : IntAlu : D=0x0000000000000000 >>>> 32985546179250: system.switch_cpus T0 : @__do_page_fault+716.32911 : >>>> Microcode_ROM : srli t12, t4, 0x20 : IntAlu : D=0x0000000081a08e00 >>>> 32985546179500: system.switch_cpus T0 : @__do_page_fault+716.32912 : >>>> Microcode_ROM : andi t12, t12, 0x7 : IntAlu : D=0x0000000000000000 >>>> 32985546179750: system.switch_cpus T0 : @__do_page_fault+716.32913 : >>>> Microcode_ROM : subi t0, t12, 0x1 : IntAlu : D=0x0000000000000008 >>>> 32985546180000: system.switch_cpus T0 : @__do_page_fault+716.32914 : >>>> Microcode_ROM : br 0x8096 : No_OpClass : >>>> 32985546215500: system.switch_cpus T0 : @__do_page_fault+716.32915 : >>>> Microcode_ROM : br 0x8098 : No_OpClass : >>>> 32985546217500: system.switch_cpus T0 : @__do_page_fault+716.32916 : >>>> Microcode_ROM : mov t6, t6, rsp : IntAlu : D=0xfffffe0000002000 >>>> 32985546217750: system.switch_cpus T0 : @__do_page_fault+716.32917 : >>>> Microcode_ROM : br 0x8099 : No_OpClass : >>>> 32985546219750: system.switch_cpus T0 : @__do_page_fault+716.32921 : >>>> Microcode_ROM : andi t6b, t6b, 0xf0 : IntAlu : D=0xfffffe0000002000 >>>> 32985546220000: system.switch_cpus T0 : @__do_page_fault+716.32922 : >>>> Microcode_ROM : subi t6, t6, 0x30 : IntAlu : D=0xfffffe0000001fd0 >>>> 32985546220250: system.switch_cpus T0 : @__do_page_fault+716.32923 : >>>> Microcode_ROM : wrip , t0, t9 : IntAlu : >>>> 32985546222250: system.switch_cpus T0 : @__do_page_fault+716.32924 : >>>> Microcode_ROM : srli t5, t4, 0x10 : IntAlu : D=0x000081a08e000010 >>>> 32985546222500: system.switch_cpus T0 : @__do_page_fault+716.32925 : >>>> Microcode_ROM : andi t5, t5, 0xff : IntAlu : D=0x0000000000000010 >>>> 32985546222750: system.switch_cpus T0 : @__do_page_fault+716.32926 : >>>> Microcode_ROM : wrdl %ctrl140, t3, t5 : IntAlu : D=0x000000000000abd0 >>>> 32985546226500: system.switch_cpus T0 : @__do_page_fault+716.32927 : >>>> Microcode_ROM : limm t10, 0 : IntAlu : D=0x0000000000000000 >>>> 32985546226750: system.switch_cpus T0 : @__do_page_fault+716.32928 : >>>> Microcode_ROM : rdsel t10w, t10w, %ctrl127 : IntAlu : D=0x0000000000000010 >>>> 32985546227000: system.switch_cpus T0 : @__do_page_fault+716.32929 : >>>> Microcode_ROM : wrsel %ctrl127, t5w, : IntAlu : D=0x0000000000000010 >>>> 32985546231500: Page-Fault: RIP 0xffffffff81057b6c: vector 14: #PF(0x3) at >>>> 0xfffffe0000001fd0 >>>> This page fault keeps happening all over again and the execution never >>>> continues. For some benchmarks it happens not far after restoring the >>>> checkpoint, >>>> for others it happens later and for some others it may even never appear. >>>> I have >>>> to also mention that the checkpoint which I restore is taken in a >>>> reasonable >>>> time after the benchmark execution start (around 2% of committed >>>> instructions) >>>> using AtomicSimpleCPU. Then I restore with DerivO3CPU or another cpu type >>>> of mine, always derived from DerivO3CPU. >>>> I am sorry for the long email, I tried to be as descriptive and >>>> comprehensive as >>>> possible. I would really appreciate your help because my knowledge over >>>> gem5 >>>> can not really help me solve this. I am looking forward to hearing from >>>> anyone >>>> having any idea... >>>> Thank you a lot in advance. >>>> -- >>>> Kleovoulos Kalaitzidis >>>> Doctorant - Équipe PACAP >>>> Centre de recherche INRIA Rennes - Bretagne Atlantique >>>> Bâtiment 12E, Bureau E321, Campus de Beaulieu, >>>> 35042 Rennes Cedex, France >>>> _______________________________________________ >>>> gem5-users mailing list >>>> [ mailto:[email protected] | [email protected] ] >>>> [ http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users | >>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users ] >>> _______________________________________________ >>> gem5-users mailing list >>> [ mailto:[email protected] | [email protected] ] >>> [ http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users | >>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users ] >> _______________________________________________ >> gem5-users mailing list >> [email protected] >> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > _______________________________________________ > gem5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
