Re: [gem5-users] one cpu keeps executing "@flush_tlb_others+133"

Zehan Cui via gem5-users Sun, 25 Jan 2015 18:35:35 -0800

Hi Andreas,

This problem still bothers me and I found more related problems.


The above problem is that the O3 CPU is stuck in such a loop (for at least
1 billion instructions) while doing page_fault():

8007580397602: system.switch_cpus_17 T0 : @flush_tlb_others+133    :   NOP
                     : IntAlu :
8007580397602: system.switch_cpus_17 T0 : @flush_tlb_others+135    : cmp
 DS:[rbp], 0
8007580397602: system.switch_cpus_17 T0 : @flush_tlb_others+139    : jnz
 0xfffffffffffffff8

I've already took checkpoints before the region of interest, and tried to
initialize all objects before that. But there are still page faults in
execution.

For another application without page faults, the O3 CPU is stuck in the
following loop which doing omp_unset_lock():

8418495964433: system.switch_cpus_14 T0 : @_spin_lock+5    :   NOP
             : IntAlu :
8418495964433: system.switch_cpus_14 T0 : @_spin_lock+7    : cmp
 DS:[rdi], 0
8418495964433: system.switch_cpus_14 T0 : @_spin_lock+10    : jle
0xfffffffffffffff9

Then, I tried to boot Linux on a 4-core system using timing CPUs, however,
CPU0 is also stuck in a loop for at least 195,498,501,784,500 ticks:

197494667971500: system.cpu0 T0 : @__smp_call_function+160    :   NOP
               : IntAlu :
197494668076500: system.cpu0 T0 : @__smp_call_function+162    : cmp rbx,
DS:[rsp + 0x14]
197494668115500: system.cpu0 T0 : @__smp_call_function+166    : jnz
0xfffffffffffffff8


the other CPUs remain idle except processing apic_timer_interrupt() every
4ms. The terminal stop at:

Booting processor 1/4 APIC 0x1
Initializing CPU#1
Calibrating delay loop (skipped)... 3999.96 BogoMIPS preset
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
Fake M5 x86_64 CPU stepping 01
Booting processor 2/4 APIC 0x2
Initializing CPU#2
Calibrating delay loop (skipped)... 3999.96 BogoMIPS preset
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
Fake M5 x86_64 CPU stepping 01
Booting processor 3/4 APIC 0x3
Initializing CPU#3
Calibrating delay loop (skipped)... 3999.96 BogoMIPS preset
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
Fake M5 x86_64 CPU stepping 01
Brought up 4 CPUs
migration_cost=185


I have tried version e5936c2d53a0 and a0cb57e1c072, and they both have such
problems.

Do you have any idea?

thanks,
Zehan

On Tue, Jan 20, 2015 at 5:01 PM, Zehan Cui <zehan....@gmail.com> wrote:

> Hi Andreas,
>
> The atomic CPU and o3 CPU do execute the same instructions, except that
> the atomic CPU exits the above loop soon, while the o3 CPU is stuck in the
> loop for at least one billion instructions. Last mail only showed the stuck
> loop instructions. The whole instruction sequence contains:
> @page_fault
> @error_entry
> @do_page_fault
> @find_vma
> @__handle_mm_fault
> @filemap_nopage
> @find_get_page
> ...
> Such instruction sequence seems like processing page fault.
>
> There is a static array in the source code without initialization, which
> may cause the page fault. I'll initialize the array before the checkpoint
> and see what happens. But it's still strange that the o3 CPU cannot exit
> the loop for such a long time.
>
> Thanks,
> Zehan
>
>
> On Tue, Jan 20, 2015 at 4:40 PM, Andreas Hansson <andreas.hans...@arm.com>
> wrote:
>
>>  Hi Zehan,
>>
>>  The o3 CPU will invariably take roughly 5-10x as long due to the level
>> of detail. Are you suggesting the atomic CPU and the o3 CPU are not
>> executing the same instructions?
>>
>>  Typically in these cases you want to drop a checkpoint before the
>> region of interest.
>>
>>  Andreas
>>
>>   From: Zehan Cui via gem5-users <gem5-users@gem5.org>
>> Reply-To: Zehan Cui <zehan....@gmail.com>, gem5 users mailing list <
>> gem5-users@gem5.org>
>> Date: Tuesday, 20 January 2015 02:14
>> To: gem5-users <gem5-users@gem5.org>
>> Subject: [gem5-users] one cpu keeps executing "@flush_tlb_others+133"
>>
>>  Hi all,
>>
>>  I run a multi-threaded application in full system mode with detailed
>> cpu model. I extract the instruction traces of each cpu, and find that the
>> last cpu keeps executing instructions like the following for at least 1
>> billion instructions (The max_instructions is set to 1 billion).
>>
>>   8007580397289: system.switch_cpus_17 T0 : @flush_tlb_others+133    :
>> NOP                      : IntAlu :
>> 8007580397289: system.switch_cpus_17 T0 : @flush_tlb_others+135    : cmp
>>    DS:[rbp], 0
>> 8007580397289: system.switch_cpus_17 T0 : @flush_tlb_others+139    : jnz
>>    0xfffffffffffffff8
>> 8007580397602: system.switch_cpus_17 T0 : @flush_tlb_others+133    :
>> NOP                      : IntAlu :
>> 8007580397602: system.switch_cpus_17 T0 : @flush_tlb_others+135    : cmp
>>    DS:[rbp], 0
>> 8007580397602: system.switch_cpus_17 T0 : @flush_tlb_others+139    : jnz
>>    0xfffffffffffffff8
>> 8007580397915: system.switch_cpus_17 T0 : @flush_tlb_others+133    :
>> NOP                      : IntAlu :
>> 8007580397915: system.switch_cpus_17 T0 : @flush_tlb_others+135    : cmp
>>    DS:[rbp], 0
>> 8007580397915: system.switch_cpus_17 T0 : @flush_tlb_others+139    : jnz
>>    0xfffffffffffffff8
>> 8007580398228: system.switch_cpus_17 T0 : @flush_tlb_others+133    :
>> NOP                      : IntAlu :
>>
>>
>>  I run the application with atomic cpu model. The same instruction
>> sequence appears for a while, but soon switches to the instructions of the
>> application.
>>
>>  Such problem has bothered me for a while. Does anyone understand this?
>>
>>  thanks,
>> zehan
>>
>> -- IMPORTANT NOTICE: The contents of this email and any attachments are
>> confidential and may also be privileged. If you are not the intended
>> recipient, please notify the sender immediately and do not disclose the
>> contents to any other person, use it for any purpose, or store or copy the
>> information in any medium. Thank you.
>>
>> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
>> Registered in England & Wales, Company No: 2557590
>> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
>> Registered in England & Wales, Company No: 2548782
>>
>
>

_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] one cpu keeps executing "@flush_tlb_others+133"

Reply via email to