On 27-09-2017 18:50, Dilger, Andreas wrote: > On Sep 26, 2017, at 01:10, Hans Henrik Happe <[email protected]> wrote: >> Hi, >> >> Did anyone else experience CPU load from ksoftirqd after 'modprobe >> lustre'? On an otherwise idle node I see: >> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 9 root 20 0 0 0 0 S 28.5 0.0 2:05.58 ksoftirqd/1 >> >> >> 57 root 20 0 0 0 0 R 23.9 0.0 2:22.91 ksoftirqd/13 >> >> The sum of those two is about 50% CPU. >> >> I have narrowed it down to the ptlrpc module. When I remove that, it stops. >> >> I also tested the 2.10.1-RC1, which is the same. > If you can run "echo l > /proc/sysrq-trigger" it will report the processes > that are currently running on the CPUs of your system to the console (and > also /var/log/messages, if it can write everything in time). > > You might need to do this several times to get a representative sample of > the ksoftirqd process stacks to see what they are doing that is consuming > so much CPU. > > Alternately, "echo t > /proc/sysrq-trigger" will report the stacks of all > processes to the console (and /v/l/m), but there will be a lot of them, > and no better chance that it catches what ksoftirqd is doing 25% of the time. I've attached the stacks. Some wakeup which I guess are initiated by something in the ptlrpc code.
Cheers, Hans Henrik
Sep 30 23:26:01 node033 kernel: NMI backtrace for cpu 11 Sep 30 23:26:01 node033 kernel: CPU 11 <d>Modules linked in: lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) libcfs(U) nfs lockd fscache auth_rpcgss nfs_acl mmfs(U) mmfslinux(U) tracedev(U) autofs4 sha512_generic cr c32c_intel sunrpc acpi_cpufreq freq_table mperf bonding ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 xfs exportfs ext2 ipmi_devintf ipmi_si ipmi_msghandler microcode iTCO_wdt iTCO_vendor_support raid0 i gb dca i2c_algo_bit i2c_core raid1 ptp pps_core sg serio_raw lpc_ich mfd_core i7core_edac edac_core shpchp ext4 jbd2 mbcache raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx sd_mod crc_t10dif ahci hpsa dm_mirror dm_region_h ash dm_log dm_mod [last unloaded: libcfs] Sep 30 23:26:01 node033 kernel: Sep 30 23:26:01 node033 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-696.6.3.el6.x86_64 #1<c> HP ProLiant DL170e G6 <c>/ProLiant DL170e G6 <c> Sep 30 23:26:01 node033 kernel: RIP: 0010:[<ffffffff812fdb01>] [<ffffffff812fdb01>] intel_idle+0xd1/0x1b0 Sep 30 23:26:01 node033 kernel: RSP: 0018:ffff880c2d067e48 EFLAGS: 00000046 Sep 30 23:26:01 node033 kernel: RAX: 0000000000000020 RBX: 0000000000000008 RCX: 0000000000000001 Sep 30 23:26:01 node033 kernel: RDX: 0000000000000000 RSI: ffff880c2d067fd8 RDI: ffffffff81a98580 Sep 30 23:26:01 node033 kernel: RBP: ffff880c2d067ed8 R08: 0000000000000005 R09: 00000000000000c8 Sep 30 23:26:01 node033 kernel: R10: 0011f630c009c747 R11: ffff880c2d067e68 R12: 0000000000000004 Sep 30 23:26:01 node033 kernel: R13: ffff880c6a6beb40 R14: 0000000000000020 R15: 14e940c25ce4ad4a Sep 30 23:26:01 node033 kernel: FS: 0000000000000000(0000) GS:ffff880c6a6a0000(0000) knlGS:0000000000000000 Sep 30 23:26:01 node033 kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Sep 30 23:26:01 node033 kernel: CR2: 00007fe396313000 CR3: 0000000001a8d000 CR4: 00000000000007e0 Sep 30 23:26:01 node033 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Sep 30 23:26:01 node033 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Sep 30 23:26:01 node033 kernel: Process swapper (pid: 0, threadinfo ffff880c2d064000, task ffff880c2d052040) Sep 30 23:26:01 node033 kernel: Stack: Sep 30 23:26:01 node033 kernel: ffff88182d471020 ffff880c6a6b1b80 000000000000000b 0011f630b5b0c60b Sep 30 23:26:01 node033 kernel: ffff880c00000002 0000000000000282 0000000000000000 000000000000804f Sep 30 23:26:01 node033 kernel: ffff880c2d067ed8 ffffffff81444bb4 0000000000000000 0000000b0a59013c Sep 30 23:26:01 node033 kernel: Call Trace: Sep 30 23:26:01 node033 kernel: [<ffffffff81444bb4>] ? menu_select+0x174/0x390 Sep 30 23:26:01 node033 kernel: [<ffffffff81443aaa>] cpuidle_idle_call+0x7a/0xe0 Sep 30 23:26:01 node033 kernel: [<ffffffff81009fe6>] cpu_idle+0xb6/0x110 Sep 30 23:26:01 node033 kernel: [<ffffffff81543ec9>] start_secondary+0x2c0/0x316 Sep 30 23:26:01 node033 kernel: Code: ff ff a8 08 75 25 31 d2 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 0f ae f0 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 f0 0f 01 c9 <e8> 5a 3f db ff 4c 29 f8 48 89 c7 e8 7f 61 d8 ff 49 89 c6 49 89 Sep 30 23:26:01 node033 kernel: Call Trace: Sep 30 23:26:01 node033 kernel: <#DB[1]> <<EOE>> Pid: 0, comm: swapper Not tainted 2.6.32-696.6.3.el6.x86_64 #1 Sep 30 23:26:01 node033 kernel: Call Trace: Sep 30 23:26:01 node033 kernel: <NMI> [<ffffffff81009c47>] ? show_regs+0x27/0x30 Sep 30 23:26:01 node033 kernel: [<ffffffff8154f819>] ? arch_trigger_all_cpu_backtrace_handler+0x99/0xc0 Sep 30 23:26:01 node033 kernel: [<ffffffff815510b5>] ? notifier_call_chain+0x55/0x80 Sep 30 23:26:01 node033 kernel: [<ffffffff8155111a>] ? atomic_notifier_call_chain+0x1a/0x20 Sep 30 23:26:01 node033 kernel: [<ffffffff810acd0e>] ? notify_die+0x2e/0x30 Sep 30 23:26:01 node033 kernel: [<ffffffff8154ec39>] ? do_nmi+0xd9/0x360 Sep 30 23:26:01 node033 kernel: [<ffffffff8154e5e3>] ? nmi+0x83/0x90 ...skipping... Sep 30 23:37:19 node033 kernel: [<ffffffff81084d95>] ? wakeup_softirqd+0x35/0x40 Sep 30 23:37:19 node033 kernel: [<ffffffff810b8dc7>] ? update_ts_time_stats+0x67/0xa0 Sep 30 23:37:19 node033 kernel: [<ffffffff810b8e3e>] ? tick_nohz_stop_idle+0x3e/0x50 Sep 30 23:37:19 node033 kernel: [<ffffffff8100a01e>] cpu_idle+0xee/0x110 Sep 30 23:37:19 node033 kernel: [<ffffffff81543ec9>] start_secondary+0x2c0/0x316 Sep 30 23:37:19 node033 kernel: Code: 00 49 8b be c8 04 00 00 41 8b b6 d0 04 00 00 e8 93 b8 af ff 66 90 e9 3a f8 ff ff 48 8b 45 a0 4c 89 b0 88 04 00 00 f0 41 ff 46 5c <65> 8b 04 25 88 6e 01 00 83 f8 01 0f 85 19 f8 ff ff 65 c7 04 25 Sep 30 23:37:19 node033 kernel: Call Trace: Sep 30 23:37:19 node033 kernel: <#DB[1]> <<EOE>> Pid: 0, comm: swapper Not tainted 2.6.32-696.6.3.el6.x86_64 #1 Sep 30 23:37:19 node033 kernel: Call Trace: Sep 30 23:37:19 node033 kernel: <NMI> [<ffffffff81009c47>] ? show_regs+0x27/0x30 Sep 30 23:37:19 node033 kernel: [<ffffffff8154f819>] ? arch_trigger_all_cpu_backtrace_handler+0x99/0xc0 Sep 30 23:37:19 node033 kernel: [<ffffffff815510b5>] ? notifier_call_chain+0x55/0x80 Sep 30 23:37:19 node033 kernel: [<ffffffff8155111a>] ? atomic_notifier_call_chain+0x1a/0x20 Sep 30 23:37:19 node033 kernel: [<ffffffff810acd0e>] ? notify_die+0x2e/0x30 Sep 30 23:37:19 node033 kernel: [<ffffffff8154ec39>] ? do_nmi+0xd9/0x360 Sep 30 23:37:19 node033 kernel: [<ffffffff8154e5e3>] ? nmi+0x83/0x90 Sep 30 23:37:19 node033 kernel: [<ffffffff8154b074>] ? schedule+0xb44/0xb70 Sep 30 23:37:19 node033 kernel: <<EOE>> [<ffffffff8106c4f5>] ? wake_up_process+0x15/0x20 Sep 30 23:37:19 node033 kernel: [<ffffffff81084d95>] ? wakeup_softirqd+0x35/0x40 Sep 30 23:37:19 node033 kernel: [<ffffffff810b8dc7>] ? update_ts_time_stats+0x67/0xa0 Sep 30 23:37:19 node033 kernel: [<ffffffff810b8e3e>] ? tick_nohz_stop_idle+0x3e/0x50 Sep 30 23:37:19 node033 kernel: [<ffffffff8100a01e>] ? cpu_idle+0xee/0x110 Sep 30 23:37:19 node033 kernel: [<ffffffff81543ec9>] ? start_secondary+0x2c0/0x316
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
