Hi Joel, I am using revision 10124. I removed all of my own modifications just to be safe.
Running with gem5.opt and restoring from a boot-up checkpoint with--debug-flag=Exec, it appears that the CPU is stuck in some sort of infinite loop, executing this continuously: 5268959012000: system.switch_cpus0 T0 : @_spin_lock_irqsave+18.0 : CMP_M_I : limm t2d, 0 : IntAlu : D=0x0000000000000000 5268959012000: system.switch_cpus0 T0 : @_spin_lock_irqsave+18.1 : CMP_M_I : ld t1d, DS:[rdi] : MemRead : D=0x00000000fffffffe A=0xffffffff80822400 5268959012000: system.switch_cpus0 T0 : @_spin_lock_irqsave+18.2 : CMP_M_I : sub t0d, t1d, t2d : IntAlu : D=0x0000000000000000 5268959012000: system.switch_cpus0 T0 : @_spin_lock_irqsave+21.0 : JLE_I : rdip t1, %ctrl153, : IntAlu : D=0xffffffff80596897 5268959012000: system.switch_cpus0 T0 : @_spin_lock_irqsave+21.1 : JLE_I : limm t2, 0xfffffffffffffff9 : IntAlu : D=0xfffffffffffffff9 5268959012000: system.switch_cpus0 T0 : @_spin_lock_irqsave+21.2 : JLE_I : wrip , t1, t2 : IntAlu : 5268959012500: system.switch_cpus0 T0 : @_spin_lock_irqsave+16 : NOP : IntAlu : 5268959012500: system.switch_cpus0 T0 : @_spin_lock_irqsave+18.0 : CMP_M_I : limm t2d, 0 : IntAlu : D=0x0000000000000000 5268959012500: system.switch_cpus0 T0 : @_spin_lock_irqsave+18.1 : CMP_M_I : ld t1d, DS:[rdi] : MemRead : D=0x00000000fffffffe A=0xffffffff80822400 5268959012500: system.switch_cpus0 T0 : @_spin_lock_irqsave+18.2 : CMP_M_I : sub t0d, t1d, t2d : IntAlu : D=0x0000000000000000 5268959012500: system.switch_cpus0 T0 : @_spin_lock_irqsave+21.0 : JLE_I : rdip t1, %ctrl153, : IntAlu : D=0xffffffff80596897 5268959012500: system.switch_cpus0 T0 : @_spin_lock_irqsave+21.1 : JLE_I : limm t2, 0xfffffffffffffff9 : IntAlu : D=0xfffffffffffffff9 5268959012000: system.switch_cpus1 T0 : @_spin_lock_irqsave+21.2 : JLE_I : wrip , t1, t2 : IntAlu : 5268959012500: system.switch_cpus1 T0 : @_spin_lock_irqsave+16 : NOP : IntAlu : 5268959012500: system.switch_cpus1 T0 : @_spin_lock_irqsave+18.0 : CMP_M_I : limm t2d, 0 : IntAlu : D=0x0000000000000000 5268959012500: system.switch_cpus1 T0 : @_spin_lock_irqsave+18.1 : CMP_M_I : ld t1d, DS:[rdi] : MemRead : D=0x00000000fffffffe A=0xffffffff80822400 5268959012500: system.switch_cpus1 T0 : @_spin_lock_irqsave+18.2 : CMP_M_I : sub t0d, t1d, t2d : IntAlu : D=0x0000000000000000 5268959012500: system.switch_cpus1 T0 : @_spin_lock_irqsave+21.0 : JLE_I : rdip t1, %ctrl153, : IntAlu : D=0xffffffff80596897 5268959012500: system.switch_cpus1 T0 : @_spin_lock_irqsave+21.1 : JLE_I : limm t2, 0xfffffffffffffff9 : IntAlu : D=0xfffffffffffffff9 5268959012500: system.switch_cpus1 T0 : @_spin_lock_irqsave+21.2 : JLE_I : wrip , t1, t2 : IntAlu : 5268959013000: system.switch_cpus1 T0 : @_spin_lock_irqsave+16 : NOP : IntAlu : ....and so on repetitively without stopping. Using --debug-flag=LocalApic, the output does indeed stop shortly after restoring from the checkpoint. The last output is: .. 5269570990500: system.cpu1.interrupts: Reported pending regular interrupt. 5269570990500: system.cpu1.interrupts: Reported pending regular interrupt. 5269570990500: system.cpu1.interrupts: Generated regular interrupt fault object. 5269570990500: system.cpu1.interrupts: Reported pending regular interrupt. 5269570990500: system.cpu1.interrupts: Interrupt 239 sent to core. 5269571169000: system.cpu1.interrupts: Writing Local APIC register 5 at offset 0xb0 as 0. ...and no more output from this point on. I appreciate your help tremendously. Ivan On Fri, May 16, 2014 at 11:18 AM, Joel Hestness <jthestn...@gmail.com>wrote: > Hi Ivan, > I believe that the email thread you previously referenced was related to > a bug that we identified and fixed with changeset > 9624<http://permalink.gmane.org/gmane.comp.emulators.m5.devel/19326>. > That bug was causing interrupts to be dropped in x86 when running with the > O3 CPU. Are you using a version of gem5 after that changeset? If not, I'd > recommend updating to a more recent version and trying to replicate this > issue again. > > If you are using a more recent version of gem5, first, please let us > know which changeset and whether you've made any changes. Then, I'd > recommend compiling gem5.opt and using the DPRINTF tracing functionality to > see if you can zero in on what is happening. Specifically, first try > passing the flag --debug-flag=Exec to look at what the CPU cores are > executing (you may also want to pass --trace-start=<<tick>> with a > simulator tick time close to when the hang happens). This trace will > include Linux kernel symbols for at least some of the lines if executing in > the kernel (e.g. handling an interrupt). If you've compiled your benchmark > without debugging symbols, it may just show the memory addresses of > instructions being executed within the application. I will guess that > you'll see kernel symbols for at least some of the executed instructions > for interrupts. > > If it appears that the CPUs are continuing to execute, try running with > --debug-flag=LocalApic. This will print the interrupts that each core is > receiving, and if it stops printing at any point, it means something has > gone wrong and we'd have to do some deeper digging. > > Keep us posted on what you find, > Joel > > > > On Thu, May 15, 2014 at 11:16 PM, Ivan Stalev <ids...@psu.edu> wrote: > >> Hi Joel, >> >> I have tried several different kernels and disk images, including the >> default ones provided on the GEM5 website in the x86-system.tar.bz2 >> download. I run with these commands: >> >> build/X86/gem5.fast -d m5out/test_run configs/example/fs.py >> --kernel=/home/mdl/ids103/full_system_images/binaries/x86_64-vmlinux-2.6.22.9.smp >> -n 2 --mem-size=4GB --cpu-type=atomic --cpu-clock=2GHz >> --script=rcs_scripts/run.rcS --max-checkpoints=1 >> >> My run.rcS script simply contains: >> >> #!/bin/sh >> /sbin/m5 resetstats >> /sbin/m5 checkpoint >> echo 'booted' >> /extras/run >> /sbin/m5 exit >> >> where "/extras/run" is simply a C program with an infinite loop that >> prints a counter. >> >> I then restore: >> >> build/X86/gem5.fast -d m5out/test_run configs/example/fs.py >> --kernel=/home/mdl/ids103/full_system_images/binaries/x86_64-vmlinux-2.6.22.9.smp >> -r 1 -n 2 --mem-size=4GB --cpu-type=detailed --cpu-clock=2GHz --caches >> --l2cache --num-l2caches=1 --l1d_size=32kB --l1i_size=32kB --l1d_assoc=4 >> --l1i_assoc=4 --l2_size=4MB --l2_assoc=8 --cacheline_size=64 >> >> I specified the disk image file in Benchmarks.py. Restoring from the same >> checkpoint and running in atomic mode works fine. I also tried booting the >> system in detailed and letting it run for a while, but once it boots, there >> is no more output. So it seems that checkpointing is not the issue. The >> "run" program is just a dummy, and the same issue also persists when >> running SPEC benchmarks or any other program. >> >> My dummy program is simply: >> >> int count=0; >> printf("**************************** HEYY \n"); >> while(1) >> printf("\n %d \n", count++); >> >> Letting it run for a while, the only output is exactly this: >> >> booted >> ******* >> >> It doesn't even finish printing the first printf. >> >> Another thing to add: In another scenario, I modified the kernel to call >> an m5 pseudo instruction on every context switch, and then GEM5 prints that >> a context switch occurred. Once again, in atomic mode this worked as >> expected. However, in detailed, even the GEM5 (printf inside GEM5 itself) >> output stopped along with the system output in the terminal. >> >> Thank you for your help. >> >> Ivan >> >> >> On Thu, May 15, 2014 at 10:51 PM, Joel Hestness <jthestn...@gmail.com>wrote: >> >>> Hi Ivan, >>> Can you please give more detail on what you're running? Specifically, >>> can you give your command line, and which kernel, disk image you're using? >>> Are you using checkpointing? >>> >>> Joel >>> >>> >>> On Mon, May 12, 2014 at 10:52 AM, Ivan Stalev via gem5-users < >>> gem5-users@gem5.org> wrote: >>> >>>> Hello, >>>> >>>> I am running X86 in full system mode. When running just 1 CPU, both >>>> atomic and detailed mode work fine. However, with more than 1 CPU, atomic >>>> works fine, but in detailed mode the system appears to hang shortly after >>>> boot-up. GEM5 doesn't crash, but the system stops having any output. >>>> Looking at the stats, it appears that instructions are still being >>>> committed, but the actual applications/benchmarks are not making progress. >>>> The issue persists with the latest version of GEM5. I also tried two >>>> different kernel versions and several different disk images. >>>> >>>> I might be experiencing what seems to be the same issue that was found >>>> about a year ago but appears to not have been fixed: >>>> https://www.mail-archive.com/gem5-dev@gem5.org/msg08839.html >>>> >>>> Can anyone reproduce this or know of a solution? >>>> >>>> Thank you, >>>> >>>> Ivan >>>> >>>> >>>> >>>> _______________________________________________ >>>> gem5-users mailing list >>>> gem5-users@gem5.org >>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >>>> >>> >>> >>> >>> -- >>> Joel Hestness >>> PhD Student, Computer Architecture >>> Dept. of Computer Science, University of Wisconsin - Madison >>> http://pages.cs.wisc.edu/~hestness/ >>> >> >> > > > -- > Joel Hestness > PhD Student, Computer Architecture > Dept. of Computer Science, University of Wisconsin - Madison > http://pages.cs.wisc.edu/~hestness/ >
_______________________________________________ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users