Hi Doug--I have seen the same message on some of our machines but so far it hasn't caused any real performance problems up until now. It's not so much if you are running SL5.5 but just as long as you are running some of the latest errata kernels.. we only saw it show up on SL5.3 but with the latest errata kernel.
Steve Steve On Mon, 30 Aug 2010, Doug Johnson wrote:
Greetings, I am seeing the following messsage when an SL5.5 (all of the most recent updates are installed) is under load writing data to an NFS disk: NOTE: It occurs for other processes than kswapd0, so I don't think that has anything to do with the issue. Aug 30 18:25:21 se kernel: INFO: task kswapd0:220 blocked for more than 120 seconds. Aug 30 18:25:21 se kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 30 18:25:21 se kernel: kswapd0 D ffff810003336420 0 220 36 221 219 (L-TLB) Aug 30 18:25:21 se kernel: ffff810003be19e0 0000000000000046 ffff810037c9c200 ffff8100ae3c4000 Aug 30 18:25:21 se kernel: 0000000000000003 000000000000000a ffff810037f2a860 ffffffff80308b60 Aug 30 18:25:21 se kernel: 00000a919f5c3fe1 00000000002d7d53 ffff810037f2aa48 00000000c770f5f8 Aug 30 18:25:21 se kernel: Call Trace: Aug 30 18:25:21 se kernel: [<ffffffff8006e1db>] do_gettimeofday+0x40/0x90 Aug 30 18:25:21 se kernel: [<ffffffff886646e5>] :nfs:nfs_wait_bit_uninterruptible+0x0/0xd Aug 30 18:25:21 se kernel: [<ffffffff800637ea>] io_schedule+0x3f/0x67 Aug 30 18:25:21 se kernel: [<ffffffff886646ee>] :nfs:nfs_wait_bit_uninterruptible+0x9/0xd Aug 30 18:25:21 se kernel: [<ffffffff80063a16>] __wait_on_bit+0x40/0x6e Aug 30 18:25:21 se kernel: [<ffffffff886646e5>] :nfs:nfs_wait_bit_uninterruptible+0x0/0xd Aug 30 18:25:21 se kernel: [<ffffffff80063ab0>] out_of_line_wait_on_bit+0x6c/0x78 Aug 30 18:25:21 se kernel: [<ffffffff800a0a06>] wake_bit_function+0x0/0x23 Aug 30 18:25:21 se kernel: [<ffffffff88668106>] :nfs:nfs_wait_on_requests_locked+0x70/0xca Aug 30 18:25:21 se kernel: [<ffffffff88669146>] :nfs:nfs_sync_inode_wait+0x60/0x1db Aug 30 18:25:21 se kernel: [<ffffffff8865f234>] :nfs:nfs_release_page+0x2c/0x4d Aug 30 18:25:21 se kernel: [<ffffffff800caea8>] shrink_inactive_list+0x511/0x8d8 Aug 30 18:25:21 se kernel: [<ffffffff800ca39b>] isolate_lru_pages+0x98/0xbf Aug 30 18:25:21 se kernel: [<ffffffff80047e98>] __pagevec_release+0x19/0x22 Aug 30 18:25:21 se kernel: [<ffffffff800ca876>] shrink_active_list+0x4b4/0x4c4 Aug 30 18:25:21 se kernel: [<ffffffff800130f5>] shrink_zone+0x127/0x18d Aug 30 18:25:21 se kernel: [<ffffffff80057b94>] kswapd+0x323/0x46c Aug 30 18:25:21 se kernel: [<ffffffff800a09d8>] autoremove_wake_function+0x0/0x2e Aug 30 18:25:21 se kernel: [<ffffffff800a07c0>] keventd_create_kthread+0x0/0xc4 Aug 30 18:25:21 se kernel: [<ffffffff80057871>] kswapd+0x0/0x46c Aug 30 18:25:21 se kernel: [<ffffffff800a07c0>] keventd_create_kthread+0x0/0xc4 Aug 30 18:25:21 se kernel: [<ffffffff8003287b>] kthread+0xfe/0x132 Aug 30 18:25:21 se kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Aug 30 18:25:21 se kernel: [<ffffffff800a07c0>] keventd_create_kthread+0x0/0xc4 Aug 30 18:25:21 se kernel: [<ffffffff8003277d>] kthread+0x0/0x132 Aug 30 18:25:21 se kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 I have seen this error with both an Intel Pro1000 and a Realtek Ethernet card. I am doing work with 2 other different Universities (completely different hardware) and they have all seen this message. Prior to 5.5, this would result in the machine locking up. Now with 5.5 it appears that the load level on the machine slowly rises (I assume due to D wait state blocked processes), but the machine is somewhat responsive. Also once these messages occur, ps will hang and that session becomes unusable. I don't what this means, but a similarly configured machine with identical hardware running SL4.7 does not produce these errors and the NFS throughput is pretty darn good. Any help or pointers in some direction will be appreciated, Thanks, doug ---------------------------------------------------------------------------- Doug Johnson email: drj...@pizero.colorado.edu B390, Duane Physics (303)-492-4506 Office Boulder, CO 80309 (303)-492-5119 FAX http://www.aaccchildren.org Tully, baby. Look around. It's a cage with golden bars. ----------------------------------------------------------------------------
-- ------------------------------------------------------------------ Steven C. Timm, Ph.D (630) 840-8525 t...@fnal.gov http://home.fnal.gov/~timm/ Fermilab Computing Division, Scientific Computing Facilities, Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.