On 27-09-2017 18:50, Dilger, Andreas wrote:
> On Sep 26, 2017, at 01:10, Hans Henrik Happe <[email protected]> wrote:
>> Hi,
>>
>> Did anyone else experience CPU load from ksoftirqd after 'modprobe
>> lustre'? On an otherwise idle node I see:
>>
>>  PID USER      PR   NI VIRT  RES  SHR S %CPU  %MEM TIME+   COMMAND
>>    9 root      20   0     0    0    0 S 28.5  0.0  2:05.58 ksoftirqd/1
>>
>>
>>   57 root      20   0     0    0    0 R 23.9  0.0  2:22.91 ksoftirqd/13
>>
>> The sum of those two is about 50% CPU.
>>
>> I have narrowed it down to the ptlrpc module. When I remove that, it stops.
>>
>> I also tested the 2.10.1-RC1, which is the same.
> If you can run "echo l > /proc/sysrq-trigger" it will report the processes
> that are currently running on the CPUs of your system to the console (and
> also /var/log/messages, if it can write everything in time).
>
> You might need to do this several times to get a representative sample of
> the ksoftirqd process stacks to see what they are doing that is consuming
> so much CPU.
>
> Alternately, "echo t > /proc/sysrq-trigger" will report the stacks of all
> processes to the console (and /v/l/m), but there will be a lot of them,
> and no better chance that it catches what ksoftirqd is doing 25% of the time.
I've attached the stacks. Some wakeup which I guess are initiated by
something in the ptlrpc code.

Cheers,
Hans Henrik


Sep 30 23:26:01 node033 kernel: NMI backtrace for cpu 11
Sep 30 23:26:01 node033 kernel: CPU 11 <d>Modules linked in: lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) libcfs(U) nfs lockd fscache auth_rpcgss nfs_acl mmfs(U) mmfslinux(U) tracedev(U) autofs4 sha512_generic cr
c32c_intel sunrpc acpi_cpufreq freq_table mperf bonding ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 xfs exportfs ext2 ipmi_devintf ipmi_si ipmi_msghandler microcode iTCO_wdt iTCO_vendor_support raid0 i
gb dca i2c_algo_bit i2c_core raid1 ptp pps_core sg serio_raw lpc_ich mfd_core i7core_edac edac_core shpchp ext4 jbd2 mbcache raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx sd_mod crc_t10dif ahci hpsa dm_mirror dm_region_h
ash dm_log dm_mod [last unloaded: libcfs]
Sep 30 23:26:01 node033 kernel: 
Sep 30 23:26:01 node033 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-696.6.3.el6.x86_64 #1<c> HP ProLiant DL170e G6  <c>/ProLiant DL170e G6  <c>
Sep 30 23:26:01 node033 kernel: RIP: 0010:[<ffffffff812fdb01>]  [<ffffffff812fdb01>] intel_idle+0xd1/0x1b0
Sep 30 23:26:01 node033 kernel: RSP: 0018:ffff880c2d067e48  EFLAGS: 00000046
Sep 30 23:26:01 node033 kernel: RAX: 0000000000000020 RBX: 0000000000000008 RCX: 0000000000000001
Sep 30 23:26:01 node033 kernel: RDX: 0000000000000000 RSI: ffff880c2d067fd8 RDI: ffffffff81a98580
Sep 30 23:26:01 node033 kernel: RBP: ffff880c2d067ed8 R08: 0000000000000005 R09: 00000000000000c8
Sep 30 23:26:01 node033 kernel: R10: 0011f630c009c747 R11: ffff880c2d067e68 R12: 0000000000000004
Sep 30 23:26:01 node033 kernel: R13: ffff880c6a6beb40 R14: 0000000000000020 R15: 14e940c25ce4ad4a
Sep 30 23:26:01 node033 kernel: FS:  0000000000000000(0000) GS:ffff880c6a6a0000(0000) knlGS:0000000000000000
Sep 30 23:26:01 node033 kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Sep 30 23:26:01 node033 kernel: CR2: 00007fe396313000 CR3: 0000000001a8d000 CR4: 00000000000007e0
Sep 30 23:26:01 node033 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 30 23:26:01 node033 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep 30 23:26:01 node033 kernel: Process swapper (pid: 0, threadinfo ffff880c2d064000, task ffff880c2d052040)
Sep 30 23:26:01 node033 kernel: Stack:
Sep 30 23:26:01 node033 kernel: ffff88182d471020 ffff880c6a6b1b80 000000000000000b 0011f630b5b0c60b
Sep 30 23:26:01 node033 kernel: ffff880c00000002 0000000000000282 0000000000000000 000000000000804f
Sep 30 23:26:01 node033 kernel: ffff880c2d067ed8 ffffffff81444bb4 0000000000000000 0000000b0a59013c
Sep 30 23:26:01 node033 kernel: Call Trace:
Sep 30 23:26:01 node033 kernel: [<ffffffff81444bb4>] ? menu_select+0x174/0x390
Sep 30 23:26:01 node033 kernel: [<ffffffff81443aaa>] cpuidle_idle_call+0x7a/0xe0
Sep 30 23:26:01 node033 kernel: [<ffffffff81009fe6>] cpu_idle+0xb6/0x110
Sep 30 23:26:01 node033 kernel: [<ffffffff81543ec9>] start_secondary+0x2c0/0x316
Sep 30 23:26:01 node033 kernel: Code: ff ff a8 08 75 25 31 d2 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 0f ae f0 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 f0 0f 01 c9 <e8> 5a 3f db ff 4c 29 f8 48 89 c7 e8 7f 61 d8 ff 49 89 c6 49 89 
Sep 30 23:26:01 node033 kernel: Call Trace:
Sep 30 23:26:01 node033 kernel: <#DB[1]>  <<EOE>> Pid: 0, comm: swapper Not tainted 2.6.32-696.6.3.el6.x86_64 #1
Sep 30 23:26:01 node033 kernel: Call Trace:
Sep 30 23:26:01 node033 kernel: <NMI>  [<ffffffff81009c47>] ? show_regs+0x27/0x30
Sep 30 23:26:01 node033 kernel: [<ffffffff8154f819>] ? arch_trigger_all_cpu_backtrace_handler+0x99/0xc0
Sep 30 23:26:01 node033 kernel: [<ffffffff815510b5>] ? notifier_call_chain+0x55/0x80
Sep 30 23:26:01 node033 kernel: [<ffffffff8155111a>] ? atomic_notifier_call_chain+0x1a/0x20
Sep 30 23:26:01 node033 kernel: [<ffffffff810acd0e>] ? notify_die+0x2e/0x30
Sep 30 23:26:01 node033 kernel: [<ffffffff8154ec39>] ? do_nmi+0xd9/0x360
Sep 30 23:26:01 node033 kernel: [<ffffffff8154e5e3>] ? nmi+0x83/0x90
...skipping...
Sep 30 23:37:19 node033 kernel: [<ffffffff81084d95>] ? wakeup_softirqd+0x35/0x40
Sep 30 23:37:19 node033 kernel: [<ffffffff810b8dc7>] ? update_ts_time_stats+0x67/0xa0
Sep 30 23:37:19 node033 kernel: [<ffffffff810b8e3e>] ? tick_nohz_stop_idle+0x3e/0x50
Sep 30 23:37:19 node033 kernel: [<ffffffff8100a01e>] cpu_idle+0xee/0x110
Sep 30 23:37:19 node033 kernel: [<ffffffff81543ec9>] start_secondary+0x2c0/0x316
Sep 30 23:37:19 node033 kernel: Code: 00 49 8b be c8 04 00 00 41 8b b6 d0 04 00 00 e8 93 b8 af ff 66 90 e9 3a f8 ff ff 48 8b 45 a0 4c 89 b0 88 04 00 00 f0 41 ff 46 5c <65> 8b 04 25 88 6e 01 00 83 f8 01 0f 85 19 f8 ff ff 65 c7 04 25 
Sep 30 23:37:19 node033 kernel: Call Trace:
Sep 30 23:37:19 node033 kernel: <#DB[1]>  <<EOE>> Pid: 0, comm: swapper Not tainted 2.6.32-696.6.3.el6.x86_64 #1
Sep 30 23:37:19 node033 kernel: Call Trace:
Sep 30 23:37:19 node033 kernel: <NMI>  [<ffffffff81009c47>] ? show_regs+0x27/0x30
Sep 30 23:37:19 node033 kernel: [<ffffffff8154f819>] ? arch_trigger_all_cpu_backtrace_handler+0x99/0xc0
Sep 30 23:37:19 node033 kernel: [<ffffffff815510b5>] ? notifier_call_chain+0x55/0x80
Sep 30 23:37:19 node033 kernel: [<ffffffff8155111a>] ? atomic_notifier_call_chain+0x1a/0x20
Sep 30 23:37:19 node033 kernel: [<ffffffff810acd0e>] ? notify_die+0x2e/0x30
Sep 30 23:37:19 node033 kernel: [<ffffffff8154ec39>] ? do_nmi+0xd9/0x360
Sep 30 23:37:19 node033 kernel: [<ffffffff8154e5e3>] ? nmi+0x83/0x90
Sep 30 23:37:19 node033 kernel: [<ffffffff8154b074>] ? schedule+0xb44/0xb70
Sep 30 23:37:19 node033 kernel: <<EOE>>  [<ffffffff8106c4f5>] ? wake_up_process+0x15/0x20
Sep 30 23:37:19 node033 kernel: [<ffffffff81084d95>] ? wakeup_softirqd+0x35/0x40
Sep 30 23:37:19 node033 kernel: [<ffffffff810b8dc7>] ? update_ts_time_stats+0x67/0xa0
Sep 30 23:37:19 node033 kernel: [<ffffffff810b8e3e>] ? tick_nohz_stop_idle+0x3e/0x50
Sep 30 23:37:19 node033 kernel: [<ffffffff8100a01e>] ? cpu_idle+0xee/0x110
Sep 30 23:37:19 node033 kernel: [<ffffffff81543ec9>] ? start_secondary+0x2c0/0x316

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to