Thanks. So I'd better to switch to another ISA. Zehan
On Mon, Jan 26, 2015 at 4:30 PM, Andreas Hansson <andreas.hans...@arm.com> wrote: > Hi Zehan, > > There are not too many people using X86 full-system (based on what I’ve > seen at least), and as such it is not very well tested. I think it’s fair > to say that ARM is the most well-tested ISA, especially in full-system. ARM > full-system also supports recent linux kernels ( > http://www.gem5.org/Running_gem5#Experimenting_with_DVFS). > > Andreas > > From: Zehan Cui <zehan....@gmail.com> > Date: Monday, 26 January 2015 02:36 > To: Andreas Hansson <andreas.hans...@arm.com> > Cc: gem5 users mailing list <gem5-users@gem5.org> > Subject: Re: [gem5-users] one cpu keeps executing "@flush_tlb_others+133" > > btw, I'm using X86 ISA. > > On Mon, Jan 26, 2015 at 10:34 AM, Zehan Cui <zehan....@gmail.com> wrote: > >> Hi Andreas, >> >> This problem still bothers me and I found more related problems. >> >> The above problem is that the O3 CPU is stuck in such a loop (for at >> least 1 billion instructions) while doing page_fault(): >> >> 8007580397602: system.switch_cpus_17 T0 : @flush_tlb_others+133 : >> NOP : IntAlu : >> 8007580397602: system.switch_cpus_17 T0 : @flush_tlb_others+135 : cmp >> DS:[rbp], 0 >> 8007580397602: system.switch_cpus_17 T0 : @flush_tlb_others+139 : jnz >> 0xfffffffffffffff8 >> >> I've already took checkpoints before the region of interest, and tried >> to initialize all objects before that. But there are still page faults in >> execution. >> >> For another application without page faults, the O3 CPU is stuck in the >> following loop which doing omp_unset_lock(): >> >> 8418495964433: system.switch_cpus_14 T0 : @_spin_lock+5 : NOP >> : IntAlu : >> 8418495964433: system.switch_cpus_14 T0 : @_spin_lock+7 : cmp >> DS:[rdi], 0 >> 8418495964433: system.switch_cpus_14 T0 : @_spin_lock+10 : jle >> 0xfffffffffffffff9 >> >> Then, I tried to boot Linux on a 4-core system using timing CPUs, >> however, CPU0 is also stuck in a loop for at least 195,498,501,784,500 >> ticks: >> >> 197494667971500: system.cpu0 T0 : @__smp_call_function+160 : NOP >> : IntAlu : >> 197494668076500: system.cpu0 T0 : @__smp_call_function+162 : cmp rbx, >> DS:[rsp + 0x14] >> 197494668115500: system.cpu0 T0 : @__smp_call_function+166 : jnz >> 0xfffffffffffffff8 >> >> >> the other CPUs remain idle except processing apic_timer_interrupt() >> every 4ms. The terminal stop at: >> >> Booting processor 1/4 APIC 0x1 >> Initializing CPU#1 >> Calibrating delay loop (skipped)... 3999.96 BogoMIPS preset >> CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) >> CPU: L2 Cache: 1024K (64 bytes/line) >> Fake M5 x86_64 CPU stepping 01 >> Booting processor 2/4 APIC 0x2 >> Initializing CPU#2 >> Calibrating delay loop (skipped)... 3999.96 BogoMIPS preset >> CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) >> CPU: L2 Cache: 1024K (64 bytes/line) >> Fake M5 x86_64 CPU stepping 01 >> Booting processor 3/4 APIC 0x3 >> Initializing CPU#3 >> Calibrating delay loop (skipped)... 3999.96 BogoMIPS preset >> CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) >> CPU: L2 Cache: 1024K (64 bytes/line) >> Fake M5 x86_64 CPU stepping 01 >> Brought up 4 CPUs >> migration_cost=185 >> >> >> I have tried version e5936c2d53a0 and a0cb57e1c072, and they both have >> such problems. >> >> Do you have any idea? >> >> thanks, >> Zehan >> >> On Tue, Jan 20, 2015 at 5:01 PM, Zehan Cui <zehan....@gmail.com> wrote: >> >>> Hi Andreas, >>> >>> The atomic CPU and o3 CPU do execute the same instructions, except >>> that the atomic CPU exits the above loop soon, while the o3 CPU is stuck in >>> the loop for at least one billion instructions. Last mail only showed the >>> stuck loop instructions. The whole instruction sequence contains: >>> @page_fault >>> @error_entry >>> @do_page_fault >>> @find_vma >>> @__handle_mm_fault >>> @filemap_nopage >>> @find_get_page >>> ... >>> Such instruction sequence seems like processing page fault. >>> >>> There is a static array in the source code without initialization, >>> which may cause the page fault. I'll initialize the array before the >>> checkpoint and see what happens. But it's still strange that the o3 CPU >>> cannot exit the loop for such a long time. >>> >>> Thanks, >>> Zehan >>> >>> >>> On Tue, Jan 20, 2015 at 4:40 PM, Andreas Hansson < >>> andreas.hans...@arm.com> wrote: >>> >>>> Hi Zehan, >>>> >>>> The o3 CPU will invariably take roughly 5-10x as long due to the >>>> level of detail. Are you suggesting the atomic CPU and the o3 CPU are not >>>> executing the same instructions? >>>> >>>> Typically in these cases you want to drop a checkpoint before the >>>> region of interest. >>>> >>>> Andreas >>>> >>>> From: Zehan Cui via gem5-users <gem5-users@gem5.org> >>>> Reply-To: Zehan Cui <zehan....@gmail.com>, gem5 users mailing list < >>>> gem5-users@gem5.org> >>>> Date: Tuesday, 20 January 2015 02:14 >>>> To: gem5-users <gem5-users@gem5.org> >>>> Subject: [gem5-users] one cpu keeps executing "@flush_tlb_others+133" >>>> >>>> Hi all, >>>> >>>> I run a multi-threaded application in full system mode with detailed >>>> cpu model. I extract the instruction traces of each cpu, and find that the >>>> last cpu keeps executing instructions like the following for at least 1 >>>> billion instructions (The max_instructions is set to 1 billion). >>>> >>>> 8007580397289: system.switch_cpus_17 T0 : @flush_tlb_others+133 : >>>> NOP : IntAlu : >>>> 8007580397289: system.switch_cpus_17 T0 : @flush_tlb_others+135 : >>>> cmp DS:[rbp], 0 >>>> 8007580397289: system.switch_cpus_17 T0 : @flush_tlb_others+139 : >>>> jnz 0xfffffffffffffff8 >>>> 8007580397602: system.switch_cpus_17 T0 : @flush_tlb_others+133 : >>>> NOP : IntAlu : >>>> 8007580397602: system.switch_cpus_17 T0 : @flush_tlb_others+135 : >>>> cmp DS:[rbp], 0 >>>> 8007580397602: system.switch_cpus_17 T0 : @flush_tlb_others+139 : >>>> jnz 0xfffffffffffffff8 >>>> 8007580397915: system.switch_cpus_17 T0 : @flush_tlb_others+133 : >>>> NOP : IntAlu : >>>> 8007580397915: system.switch_cpus_17 T0 : @flush_tlb_others+135 : >>>> cmp DS:[rbp], 0 >>>> 8007580397915: system.switch_cpus_17 T0 : @flush_tlb_others+139 : >>>> jnz 0xfffffffffffffff8 >>>> 8007580398228: system.switch_cpus_17 T0 : @flush_tlb_others+133 : >>>> NOP : IntAlu : >>>> >>>> >>>> I run the application with atomic cpu model. The same instruction >>>> sequence appears for a while, but soon switches to the instructions of the >>>> application. >>>> >>>> Such problem has bothered me for a while. Does anyone understand this? >>>> >>>> thanks, >>>> zehan >>>> >>>> -- IMPORTANT NOTICE: The contents of this email and any attachments are >>>> confidential and may also be privileged. If you are not the intended >>>> recipient, please notify the sender immediately and do not disclose the >>>> contents to any other person, use it for any purpose, or store or copy the >>>> information in any medium. Thank you. >>>> >>>> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, >>>> Registered in England & Wales, Company No: 2557590 >>>> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 >>>> 9NJ, Registered in England & Wales, Company No: 2548782 >>>> >>> >>> >> > > -- IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy the > information in any medium. Thank you. > > ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, > Registered in England & Wales, Company No: 2557590 > ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, > Registered in England & Wales, Company No: 2548782 >
_______________________________________________ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users