btw, I'm using X86 ISA.

On Mon, Jan 26, 2015 at 10:34 AM, Zehan Cui <zehan....@gmail.com> wrote:

> Hi Andreas,
>
> This problem still bothers me and I found more related problems.
>
> The above problem is that the O3 CPU is stuck in such a loop (for at least
> 1 billion instructions) while doing page_fault():
>
> 8007580397602: system.switch_cpus_17 T0 : @flush_tlb_others+133    :   NOP
>                      : IntAlu :
> 8007580397602: system.switch_cpus_17 T0 : @flush_tlb_others+135    : cmp
>  DS:[rbp], 0
> 8007580397602: system.switch_cpus_17 T0 : @flush_tlb_others+139    : jnz
>  0xfffffffffffffff8
>
> I've already took checkpoints before the region of interest, and tried to
> initialize all objects before that. But there are still page faults in
> execution.
>
> For another application without page faults, the O3 CPU is stuck in the
> following loop which doing omp_unset_lock():
>
> 8418495964433: system.switch_cpus_14 T0 : @_spin_lock+5    :   NOP
>              : IntAlu :
> 8418495964433: system.switch_cpus_14 T0 : @_spin_lock+7    : cmp
>  DS:[rdi], 0
> 8418495964433: system.switch_cpus_14 T0 : @_spin_lock+10    : jle
> 0xfffffffffffffff9
>
> Then, I tried to boot Linux on a 4-core system using timing CPUs, however,
> CPU0 is also stuck in a loop for at least 195,498,501,784,500 ticks:
>
> 197494667971500: system.cpu0 T0 : @__smp_call_function+160    :   NOP
>                  : IntAlu :
> 197494668076500: system.cpu0 T0 : @__smp_call_function+162    : cmp rbx,
> DS:[rsp + 0x14]
> 197494668115500: system.cpu0 T0 : @__smp_call_function+166    : jnz 
> 0xfffffffffffffff8
>
>
> the other CPUs remain idle except processing apic_timer_interrupt() every
> 4ms. The terminal stop at:
>
> Booting processor 1/4 APIC 0x1
> Initializing CPU#1
> Calibrating delay loop (skipped)... 3999.96 BogoMIPS preset
> CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
> CPU: L2 Cache: 1024K (64 bytes/line)
> Fake M5 x86_64 CPU stepping 01
> Booting processor 2/4 APIC 0x2
> Initializing CPU#2
> Calibrating delay loop (skipped)... 3999.96 BogoMIPS preset
> CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
> CPU: L2 Cache: 1024K (64 bytes/line)
> Fake M5 x86_64 CPU stepping 01
> Booting processor 3/4 APIC 0x3
> Initializing CPU#3
> Calibrating delay loop (skipped)... 3999.96 BogoMIPS preset
> CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
> CPU: L2 Cache: 1024K (64 bytes/line)
> Fake M5 x86_64 CPU stepping 01
> Brought up 4 CPUs
> migration_cost=185
>
>
> I have tried version e5936c2d53a0 and a0cb57e1c072, and they both have
> such problems.
>
> Do you have any idea?
>
> thanks,
> Zehan
>
> On Tue, Jan 20, 2015 at 5:01 PM, Zehan Cui <zehan....@gmail.com> wrote:
>
>> Hi Andreas,
>>
>> The atomic CPU and o3 CPU do execute the same instructions, except that
>> the atomic CPU exits the above loop soon, while the o3 CPU is stuck in the
>> loop for at least one billion instructions. Last mail only showed the stuck
>> loop instructions. The whole instruction sequence contains:
>> @page_fault
>> @error_entry
>> @do_page_fault
>> @find_vma
>> @__handle_mm_fault
>> @filemap_nopage
>> @find_get_page
>> ...
>> Such instruction sequence seems like processing page fault.
>>
>> There is a static array in the source code without initialization, which
>> may cause the page fault. I'll initialize the array before the checkpoint
>> and see what happens. But it's still strange that the o3 CPU cannot exit
>> the loop for such a long time.
>>
>> Thanks,
>> Zehan
>>
>>
>> On Tue, Jan 20, 2015 at 4:40 PM, Andreas Hansson <andreas.hans...@arm.com
>> > wrote:
>>
>>>  Hi Zehan,
>>>
>>>  The o3 CPU will invariably take roughly 5-10x as long due to the level
>>> of detail. Are you suggesting the atomic CPU and the o3 CPU are not
>>> executing the same instructions?
>>>
>>>  Typically in these cases you want to drop a checkpoint before the
>>> region of interest.
>>>
>>>  Andreas
>>>
>>>   From: Zehan Cui via gem5-users <gem5-users@gem5.org>
>>> Reply-To: Zehan Cui <zehan....@gmail.com>, gem5 users mailing list <
>>> gem5-users@gem5.org>
>>> Date: Tuesday, 20 January 2015 02:14
>>> To: gem5-users <gem5-users@gem5.org>
>>> Subject: [gem5-users] one cpu keeps executing "@flush_tlb_others+133"
>>>
>>>  Hi all,
>>>
>>>  I run a multi-threaded application in full system mode with detailed
>>> cpu model. I extract the instruction traces of each cpu, and find that the
>>> last cpu keeps executing instructions like the following for at least 1
>>> billion instructions (The max_instructions is set to 1 billion).
>>>
>>>   8007580397289: system.switch_cpus_17 T0 : @flush_tlb_others+133    :
>>>   NOP                      : IntAlu :
>>> 8007580397289: system.switch_cpus_17 T0 : @flush_tlb_others+135    : cmp
>>>    DS:[rbp], 0
>>> 8007580397289: system.switch_cpus_17 T0 : @flush_tlb_others+139    : jnz
>>>    0xfffffffffffffff8
>>> 8007580397602: system.switch_cpus_17 T0 : @flush_tlb_others+133    :
>>> NOP                      : IntAlu :
>>> 8007580397602: system.switch_cpus_17 T0 : @flush_tlb_others+135    : cmp
>>>    DS:[rbp], 0
>>> 8007580397602: system.switch_cpus_17 T0 : @flush_tlb_others+139    : jnz
>>>    0xfffffffffffffff8
>>> 8007580397915: system.switch_cpus_17 T0 : @flush_tlb_others+133    :
>>> NOP                      : IntAlu :
>>> 8007580397915: system.switch_cpus_17 T0 : @flush_tlb_others+135    : cmp
>>>    DS:[rbp], 0
>>> 8007580397915: system.switch_cpus_17 T0 : @flush_tlb_others+139    : jnz
>>>    0xfffffffffffffff8
>>> 8007580398228: system.switch_cpus_17 T0 : @flush_tlb_others+133    :
>>> NOP                      : IntAlu :
>>>
>>>
>>>  I run the application with atomic cpu model. The same instruction
>>> sequence appears for a while, but soon switches to the instructions of the
>>> application.
>>>
>>>  Such problem has bothered me for a while. Does anyone understand this?
>>>
>>>  thanks,
>>> zehan
>>>
>>> -- IMPORTANT NOTICE: The contents of this email and any attachments are
>>> confidential and may also be privileged. If you are not the intended
>>> recipient, please notify the sender immediately and do not disclose the
>>> contents to any other person, use it for any purpose, or store or copy the
>>> information in any medium. Thank you.
>>>
>>> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
>>> Registered in England & Wales, Company No: 2557590
>>> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1
>>> 9NJ, Registered in England & Wales, Company No: 2548782
>>>
>>
>>
>
_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to