btw, I'm using X86 ISA. On Mon, Jan 26, 2015 at 10:34 AM, Zehan Cui <zehan....@gmail.com> wrote:
> Hi Andreas, > > This problem still bothers me and I found more related problems. > > The above problem is that the O3 CPU is stuck in such a loop (for at least > 1 billion instructions) while doing page_fault(): > > 8007580397602: system.switch_cpus_17 T0 : @flush_tlb_others+133 : NOP > : IntAlu : > 8007580397602: system.switch_cpus_17 T0 : @flush_tlb_others+135 : cmp > DS:[rbp], 0 > 8007580397602: system.switch_cpus_17 T0 : @flush_tlb_others+139 : jnz > 0xfffffffffffffff8 > > I've already took checkpoints before the region of interest, and tried to > initialize all objects before that. But there are still page faults in > execution. > > For another application without page faults, the O3 CPU is stuck in the > following loop which doing omp_unset_lock(): > > 8418495964433: system.switch_cpus_14 T0 : @_spin_lock+5 : NOP > : IntAlu : > 8418495964433: system.switch_cpus_14 T0 : @_spin_lock+7 : cmp > DS:[rdi], 0 > 8418495964433: system.switch_cpus_14 T0 : @_spin_lock+10 : jle > 0xfffffffffffffff9 > > Then, I tried to boot Linux on a 4-core system using timing CPUs, however, > CPU0 is also stuck in a loop for at least 195,498,501,784,500 ticks: > > 197494667971500: system.cpu0 T0 : @__smp_call_function+160 : NOP > : IntAlu : > 197494668076500: system.cpu0 T0 : @__smp_call_function+162 : cmp rbx, > DS:[rsp + 0x14] > 197494668115500: system.cpu0 T0 : @__smp_call_function+166 : jnz > 0xfffffffffffffff8 > > > the other CPUs remain idle except processing apic_timer_interrupt() every > 4ms. The terminal stop at: > > Booting processor 1/4 APIC 0x1 > Initializing CPU#1 > Calibrating delay loop (skipped)... 3999.96 BogoMIPS preset > CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) > CPU: L2 Cache: 1024K (64 bytes/line) > Fake M5 x86_64 CPU stepping 01 > Booting processor 2/4 APIC 0x2 > Initializing CPU#2 > Calibrating delay loop (skipped)... 3999.96 BogoMIPS preset > CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) > CPU: L2 Cache: 1024K (64 bytes/line) > Fake M5 x86_64 CPU stepping 01 > Booting processor 3/4 APIC 0x3 > Initializing CPU#3 > Calibrating delay loop (skipped)... 3999.96 BogoMIPS preset > CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) > CPU: L2 Cache: 1024K (64 bytes/line) > Fake M5 x86_64 CPU stepping 01 > Brought up 4 CPUs > migration_cost=185 > > > I have tried version e5936c2d53a0 and a0cb57e1c072, and they both have > such problems. > > Do you have any idea? > > thanks, > Zehan > > On Tue, Jan 20, 2015 at 5:01 PM, Zehan Cui <zehan....@gmail.com> wrote: > >> Hi Andreas, >> >> The atomic CPU and o3 CPU do execute the same instructions, except that >> the atomic CPU exits the above loop soon, while the o3 CPU is stuck in the >> loop for at least one billion instructions. Last mail only showed the stuck >> loop instructions. The whole instruction sequence contains: >> @page_fault >> @error_entry >> @do_page_fault >> @find_vma >> @__handle_mm_fault >> @filemap_nopage >> @find_get_page >> ... >> Such instruction sequence seems like processing page fault. >> >> There is a static array in the source code without initialization, which >> may cause the page fault. I'll initialize the array before the checkpoint >> and see what happens. But it's still strange that the o3 CPU cannot exit >> the loop for such a long time. >> >> Thanks, >> Zehan >> >> >> On Tue, Jan 20, 2015 at 4:40 PM, Andreas Hansson <andreas.hans...@arm.com >> > wrote: >> >>> Hi Zehan, >>> >>> The o3 CPU will invariably take roughly 5-10x as long due to the level >>> of detail. Are you suggesting the atomic CPU and the o3 CPU are not >>> executing the same instructions? >>> >>> Typically in these cases you want to drop a checkpoint before the >>> region of interest. >>> >>> Andreas >>> >>> From: Zehan Cui via gem5-users <gem5-users@gem5.org> >>> Reply-To: Zehan Cui <zehan....@gmail.com>, gem5 users mailing list < >>> gem5-users@gem5.org> >>> Date: Tuesday, 20 January 2015 02:14 >>> To: gem5-users <gem5-users@gem5.org> >>> Subject: [gem5-users] one cpu keeps executing "@flush_tlb_others+133" >>> >>> Hi all, >>> >>> I run a multi-threaded application in full system mode with detailed >>> cpu model. I extract the instruction traces of each cpu, and find that the >>> last cpu keeps executing instructions like the following for at least 1 >>> billion instructions (The max_instructions is set to 1 billion). >>> >>> 8007580397289: system.switch_cpus_17 T0 : @flush_tlb_others+133 : >>> NOP : IntAlu : >>> 8007580397289: system.switch_cpus_17 T0 : @flush_tlb_others+135 : cmp >>> DS:[rbp], 0 >>> 8007580397289: system.switch_cpus_17 T0 : @flush_tlb_others+139 : jnz >>> 0xfffffffffffffff8 >>> 8007580397602: system.switch_cpus_17 T0 : @flush_tlb_others+133 : >>> NOP : IntAlu : >>> 8007580397602: system.switch_cpus_17 T0 : @flush_tlb_others+135 : cmp >>> DS:[rbp], 0 >>> 8007580397602: system.switch_cpus_17 T0 : @flush_tlb_others+139 : jnz >>> 0xfffffffffffffff8 >>> 8007580397915: system.switch_cpus_17 T0 : @flush_tlb_others+133 : >>> NOP : IntAlu : >>> 8007580397915: system.switch_cpus_17 T0 : @flush_tlb_others+135 : cmp >>> DS:[rbp], 0 >>> 8007580397915: system.switch_cpus_17 T0 : @flush_tlb_others+139 : jnz >>> 0xfffffffffffffff8 >>> 8007580398228: system.switch_cpus_17 T0 : @flush_tlb_others+133 : >>> NOP : IntAlu : >>> >>> >>> I run the application with atomic cpu model. The same instruction >>> sequence appears for a while, but soon switches to the instructions of the >>> application. >>> >>> Such problem has bothered me for a while. Does anyone understand this? >>> >>> thanks, >>> zehan >>> >>> -- IMPORTANT NOTICE: The contents of this email and any attachments are >>> confidential and may also be privileged. If you are not the intended >>> recipient, please notify the sender immediately and do not disclose the >>> contents to any other person, use it for any purpose, or store or copy the >>> information in any medium. Thank you. >>> >>> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, >>> Registered in England & Wales, Company No: 2557590 >>> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 >>> 9NJ, Registered in England & Wales, Company No: 2548782 >>> >> >> >
_______________________________________________ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users