Hi Andreas, This problem still bothers me and I found more related problems.
The above problem is that the O3 CPU is stuck in such a loop (for at least 1 billion instructions) while doing page_fault(): 8007580397602: system.switch_cpus_17 T0 : @flush_tlb_others+133 : NOP : IntAlu : 8007580397602: system.switch_cpus_17 T0 : @flush_tlb_others+135 : cmp DS:[rbp], 0 8007580397602: system.switch_cpus_17 T0 : @flush_tlb_others+139 : jnz 0xfffffffffffffff8 I've already took checkpoints before the region of interest, and tried to initialize all objects before that. But there are still page faults in execution. For another application without page faults, the O3 CPU is stuck in the following loop which doing omp_unset_lock(): 8418495964433: system.switch_cpus_14 T0 : @_spin_lock+5 : NOP : IntAlu : 8418495964433: system.switch_cpus_14 T0 : @_spin_lock+7 : cmp DS:[rdi], 0 8418495964433: system.switch_cpus_14 T0 : @_spin_lock+10 : jle 0xfffffffffffffff9 Then, I tried to boot Linux on a 4-core system using timing CPUs, however, CPU0 is also stuck in a loop for at least 195,498,501,784,500 ticks: 197494667971500: system.cpu0 T0 : @__smp_call_function+160 : NOP : IntAlu : 197494668076500: system.cpu0 T0 : @__smp_call_function+162 : cmp rbx, DS:[rsp + 0x14] 197494668115500: system.cpu0 T0 : @__smp_call_function+166 : jnz 0xfffffffffffffff8 the other CPUs remain idle except processing apic_timer_interrupt() every 4ms. The terminal stop at: Booting processor 1/4 APIC 0x1 Initializing CPU#1 Calibrating delay loop (skipped)... 3999.96 BogoMIPS preset CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) Fake M5 x86_64 CPU stepping 01 Booting processor 2/4 APIC 0x2 Initializing CPU#2 Calibrating delay loop (skipped)... 3999.96 BogoMIPS preset CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) Fake M5 x86_64 CPU stepping 01 Booting processor 3/4 APIC 0x3 Initializing CPU#3 Calibrating delay loop (skipped)... 3999.96 BogoMIPS preset CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) Fake M5 x86_64 CPU stepping 01 Brought up 4 CPUs migration_cost=185 I have tried version e5936c2d53a0 and a0cb57e1c072, and they both have such problems. Do you have any idea? thanks, Zehan On Tue, Jan 20, 2015 at 5:01 PM, Zehan Cui <zehan....@gmail.com> wrote: > Hi Andreas, > > The atomic CPU and o3 CPU do execute the same instructions, except that > the atomic CPU exits the above loop soon, while the o3 CPU is stuck in the > loop for at least one billion instructions. Last mail only showed the stuck > loop instructions. The whole instruction sequence contains: > @page_fault > @error_entry > @do_page_fault > @find_vma > @__handle_mm_fault > @filemap_nopage > @find_get_page > ... > Such instruction sequence seems like processing page fault. > > There is a static array in the source code without initialization, which > may cause the page fault. I'll initialize the array before the checkpoint > and see what happens. But it's still strange that the o3 CPU cannot exit > the loop for such a long time. > > Thanks, > Zehan > > > On Tue, Jan 20, 2015 at 4:40 PM, Andreas Hansson <andreas.hans...@arm.com> > wrote: > >> Hi Zehan, >> >> The o3 CPU will invariably take roughly 5-10x as long due to the level >> of detail. Are you suggesting the atomic CPU and the o3 CPU are not >> executing the same instructions? >> >> Typically in these cases you want to drop a checkpoint before the >> region of interest. >> >> Andreas >> >> From: Zehan Cui via gem5-users <gem5-users@gem5.org> >> Reply-To: Zehan Cui <zehan....@gmail.com>, gem5 users mailing list < >> gem5-users@gem5.org> >> Date: Tuesday, 20 January 2015 02:14 >> To: gem5-users <gem5-users@gem5.org> >> Subject: [gem5-users] one cpu keeps executing "@flush_tlb_others+133" >> >> Hi all, >> >> I run a multi-threaded application in full system mode with detailed >> cpu model. I extract the instruction traces of each cpu, and find that the >> last cpu keeps executing instructions like the following for at least 1 >> billion instructions (The max_instructions is set to 1 billion). >> >> 8007580397289: system.switch_cpus_17 T0 : @flush_tlb_others+133 : >> NOP : IntAlu : >> 8007580397289: system.switch_cpus_17 T0 : @flush_tlb_others+135 : cmp >> DS:[rbp], 0 >> 8007580397289: system.switch_cpus_17 T0 : @flush_tlb_others+139 : jnz >> 0xfffffffffffffff8 >> 8007580397602: system.switch_cpus_17 T0 : @flush_tlb_others+133 : >> NOP : IntAlu : >> 8007580397602: system.switch_cpus_17 T0 : @flush_tlb_others+135 : cmp >> DS:[rbp], 0 >> 8007580397602: system.switch_cpus_17 T0 : @flush_tlb_others+139 : jnz >> 0xfffffffffffffff8 >> 8007580397915: system.switch_cpus_17 T0 : @flush_tlb_others+133 : >> NOP : IntAlu : >> 8007580397915: system.switch_cpus_17 T0 : @flush_tlb_others+135 : cmp >> DS:[rbp], 0 >> 8007580397915: system.switch_cpus_17 T0 : @flush_tlb_others+139 : jnz >> 0xfffffffffffffff8 >> 8007580398228: system.switch_cpus_17 T0 : @flush_tlb_others+133 : >> NOP : IntAlu : >> >> >> I run the application with atomic cpu model. The same instruction >> sequence appears for a while, but soon switches to the instructions of the >> application. >> >> Such problem has bothered me for a while. Does anyone understand this? >> >> thanks, >> zehan >> >> -- IMPORTANT NOTICE: The contents of this email and any attachments are >> confidential and may also be privileged. If you are not the intended >> recipient, please notify the sender immediately and do not disclose the >> contents to any other person, use it for any purpose, or store or copy the >> information in any medium. Thank you. >> >> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, >> Registered in England & Wales, Company No: 2557590 >> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, >> Registered in England & Wales, Company No: 2548782 >> > >
_______________________________________________ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users