Hello,
I'm administering a server that has frozen three times over the past two
days. During these times, it seemed that most processes would all of a
sudden start hanging, and I couldn't SSH into the server or even log
into the console. I would start seeing messages like "INFO: task
kswapd0:28 blocked for more than 120 seconds" on the console shortly
after the processes hung. The only way I could get the server to
respond again was by resetting it.
I've posted screenshots of one of the kernel messages that would be
displayed on the console during each of the three freezes at:
http://innovacomputing.com/kernel-hangs/20100921-1442.png
http://innovacomputing.com/kernel-hangs/20100922-1442.png
http://innovacomputing.com/kernel-hangs/20100922-1720.png
Only the errors from the second freeze above got logged to the
filesystem, which I'm pasting here:
[85324.832015] INFO: task kswapd0:28 blocked for more than 120 seconds.
[85324.870132] "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[85324.917059] kswapd0 D ffff880005515780 0 28 2 0x00000000
[85324.964614] ffff8800aa60e9f0 0000000000000046 0000000000000000
0000000000000246
[85325.009245] 000112008144ac20 000000000000f9e0 ffff88012e94ffd8
0000000000015780
[85325.053915] 0000000000015780 ffff88012fab3f90 ffff88012fab4288
000000012e94f6a0
[85325.098533] Call Trace:
[85325.113218] [<ffffffffa01c2519>] ? lc_get+0x4c/0x1de [lru_cache]
[85325.149714] [<ffffffffa022783d>] ? drbd_al_begin_io+0xd3/0x195 [drbd]
[85325.188836] [<ffffffff8110f58b>] ? bio_alloc_bioset+0x45/0xb7
[85325.223764] [<ffffffff8106389a>] ? autoremove_wake_function+0x0/0x2e
[85325.262348] [<ffffffffa0225004>] ? drbd_make_request_common+0x2dc/0xc52
[drbd]
[85325.306149] [<ffffffffa0223b71>] ? inc_ap_bio+0x6a/0x12e [drbd]
[85325.342114] [<ffffffff8106389a>] ? autoremove_wake_function+0x0/0x2e
[85325.380703] [<ffffffffa0225bb0>] ? drbd_make_request_26+0x236/0x34e [drbd]
[85325.422404] [<ffffffff8103555f>] ? flush_tlb_page+0x5a/0x7b
[85325.456309] [<ffffffff8117c64b>] ? generic_make_request+0x299/0x2f9
[85325.494384] [<ffffffff8118fe6c>] ? radix_tree_delete+0xbf/0x1ba
[85325.530374] [<ffffffff8117c781>] ? submit_bio+0xd6/0xf2
[85325.562193] [<ffffffff8110b18b>] ? submit_bh+0xf5/0x115
[85325.594019] [<ffffffff8110d701>] ? __block_write_full_page+0x1d6/0x2ac
[85325.633646] [<ffffffff8110c486>] ? end_buffer_async_write+0x0/0x13b
[85325.671715] [<ffffffff8110f744>] ? blkdev_get_block+0x0/0x57
[85325.706139] [<ffffffff810bc92d>] ? shrink_page_list+0x375/0x623
[85325.742129] [<ffffffff810bd2fe>] ? shrink_list+0x44a/0x731
[85325.775503] [<ffffffff810b953c>] ? determine_dirtyable_memory+0xd/0x1d
[85325.815129] [<ffffffff810b95b4>] ? get_dirty_limits+0x1d/0x259
[85325.850624] [<ffffffff810592d8>] ? try_to_del_timer_sync+0x63/0x6c
[85325.888214] [<ffffffff810bd865>] ? shrink_zone+0x280/0x342
[85325.921634] [<ffffffffa015c226>] ? mb_cache_shrink_fn+0x26/0x129 [mbcache]
[85325.963343] [<ffffffff810bda68>] ? shrink_slab+0x141/0x153
[85325.996711] [<ffffffff810be28a>] ? kswapd+0x4b9/0x683
[85326.027508] [<ffffffff810bddd1>] ? kswapd+0x0/0x683
[85326.057264] [<ffffffff810bb957>] ? isolate_pages_global+0x0/0x20f
[85326.094266] [<ffffffff8106389a>] ? autoremove_wake_function+0x0/0x2e
[85326.132862] [<ffffffff810397f6>] ? __wake_up_common+0x44/0x73
[85326.167806] [<ffffffff810bddd1>] ? kswapd+0x0/0x683
[85326.197548] [<ffffffff810635cd>] ? kthread+0x79/0x81
[85326.227801] [<ffffffff81011baa>] ? child_rip+0xa/0x20
[85326.258593] [<ffffffff81063554>] ? kthread+0x0/0x81
[85326.288330] [<ffffffff81011ba0>] ? child_rip+0x0/0x20
[85326.319121] INFO: task kjournald:2223 blocked for more than 120 seconds.
[85326.359274] "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[85326.406243] kjournald D 0000000000000002 0 2223 2 0x00000000
[85326.447615] ffff88012c0f4db0 0000000000000046 0000000000000001
0000000000000286
[85326.492332] 0000000000000003 000000000000f9e0 ffff88012962dfd8
0000000000015780
[85326.536943] 0000000000015780 ffff88012c935bd0 ffff88012c935ec8
00000000a01362ec
[85326.581609] Call Trace:
[85326.596282] [<ffffffff81016539>] ? read_tsc+0xa/0x20
[85326.626562] [<ffffffff8110c2bc>] ? sync_buffer+0x0/0x40
[85326.658382] [<ffffffff812f7ae8>] ? io_schedule+0x73/0xb7
[85326.690733] [<ffffffff8110c2f7>] ? sync_buffer+0x3b/0x40
[85326.723063] [<ffffffff812f7ff5>] ? __wait_on_bit+0x41/0x70
[85326.756446] [<ffffffff8110c2bc>] ? sync_buffer+0x0/0x40
[85326.788268] [<ffffffff812f808f>] ? out_of_line_wait_on_bit+0x6b/0x77
[85326.826867] [<ffffffff810638c8>] ? wake_bit_function+0x0/0x23
[85326.861803] [<ffffffffa01661cd>] ? journal_commit_transaction+0x508/0xe2b
[jbd]
[85326.906115] [<ffffffff81059250>] ? lock_timer_base+0x26/0x4b
[85326.940544] [<ffffffffa0169413>] ? kjournald+0xdf/0x226 [jbd]
[85326.975474] [<ffffffff8106389a>] ? autoremove_wake_function+0x0/0x2e
[85327.014066] [<ffffffffa0169334>] ? kjournald+0x0/0x226 [jbd]
[85327.048489] [<ffffffff810635cd>] ? kthread+0x79/0x81
[85327.078746] [<ffffffff81011baa>] ? child_rip+0xa/0x20
[85327.109533] [<ffffffff81063554>] ? kthread+0x0/0x81
[85327.139276] [<ffffffff81011ba0>] ? child_rip+0x0/0x20
[85327.170061] INFO: task flush-147:3:2224 blocked for more than 120 seconds.
[85327.211245] "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[85327.258168] flush-147:3 D ffff88000550fb30 0 2224 2 0x00000000
[85327.299553] ffff88012c930000 0000000000000046 ffff88012d601720
ffff88012d60171c
[85327.344264] 0000000000000000 000000000000f9e0 ffff88012d601fd8
0000000000015780
[85327.388932] 0000000000015780 ffff88012c930710 ffff88012c930a08
0000000100015780
[85327.433545] Call Trace:
[85327.448216] [<ffffffff812f7dcb>] ? schedule_timeout+0x2e/0xdd
[85327.483187] [<ffffffff812f7c83>] ? wait_for_common+0xde/0x15b
[85327.518108] [<ffffffff81048ecd>] ? default_wake_function+0x0/0x9
[85327.554658] [<ffffffffa02278a9>] ? drbd_al_begin_io+0x13f/0x195 [drbd]
[85327.594335] [<ffffffffa02288ca>] ? w_al_write_transaction+0x0/0x2d6 [drbd]
[85327.636061] [<ffffffff8110f58b>] ? bio_alloc_bioset+0x45/0xb7
[85327.671008] [<ffffffffa0225004>] ? drbd_make_request_common+0x2dc/0xc52
[drbd]
[85327.714853] [<ffffffffa0223b71>] ? inc_ap_bio+0x6a/0x12e [drbd]
[85327.750830] [<ffffffff8106389a>] ? autoremove_wake_function+0x0/0x2e
[85327.789414] [<ffffffffa0225bb0>] ? drbd_make_request_26+0x236/0x34e [drbd]
[85327.831109] [<ffffffff8117c64b>] ? generic_make_request+0x299/0x2f9
[85327.869172] [<ffffffff8117c781>] ? submit_bio+0xd6/0xf2
[85327.900993] [<ffffffff8110b18b>] ? submit_bh+0xf5/0x115
[85327.932830] [<ffffffff8110d701>] ? __block_write_full_page+0x1d6/0x2ac
[85327.972460] [<ffffffff8110c486>] ? end_buffer_async_write+0x0/0x13b
[85328.010518] [<ffffffff8110f744>] ? blkdev_get_block+0x0/0x57
[85328.044932] [<ffffffff810b8c22>] ? __writepage+0xa/0x25
[85328.076771] [<ffffffff810b92a9>] ? write_cache_pages+0x20b/0x327
[85328.113263] [<ffffffff810b8c18>] ? __writepage+0x0/0x25
[85328.145109] [<ffffffff8110606e>] ? writeback_single_inode+0xe7/0x2da
[85328.183675] [<ffffffff81106d74>] ? writeback_inodes_wb+0x424/0x4ff
[85328.221243] [<ffffffff81106f7b>] ? wb_writeback+0x12c/0x1ab
[85328.255131] [<ffffffff81107115>] ? wb_do_writeback+0x73/0x165
[85328.290071] [<ffffffff81107238>] ? bdi_writeback_task+0x31/0xaa
[85328.326061] [<ffffffff810c744e>] ? bdi_start_fn+0x0/0xd2
[85328.358396] [<ffffffff810c74be>] ? bdi_start_fn+0x70/0xd2
[85328.391259] [<ffffffff810c744e>] ? bdi_start_fn+0x0/0xd2
[85328.423604] [<ffffffff810635cd>] ? kthread+0x79/0x81
[85328.453884] [<ffffffff81011baa>] ? child_rip+0xa/0x20
[85328.484659] [<ffffffff81063554>] ? kthread+0x0/0x81
[85328.514397] [<ffffffff81011ba0>] ? child_rip+0x0/0x20
[85328.545207] INFO: task postgres:10183 blocked for more than 120 seconds.
[85328.585329] "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[85328.632232] postgres D ffff880005515780 0 10183 2958 0x00000000
[85328.673576] ffff880069902a60 0000000000000082 ffff88010c0913c8
ffff88010c0913c8
[85328.718186] ffff88010c0913d8 000000000000f9e0 ffff88010c091fd8
0000000000015780
[85328.762802] 0000000000015780 ffff88010a488710 ffff88010a488a08
0000000100016640
[85328.807424] Call Trace:
[85328.822094] [<ffffffffa01c2519>] ? lc_get+0x4c/0x1de [lru_cache]
[85328.858600] [<ffffffffa022783d>] ? drbd_al_begin_io+0xd3/0x195 [drbd]
[85328.897700] [<ffffffff8110f58b>] ? bio_alloc_bioset+0x45/0xb7
[85328.932668] [<ffffffff8106389a>] ? autoremove_wake_function+0x0/0x2e
[85328.971245] [<ffffffffa0225004>] ? drbd_make_request_common+0x2dc/0xc52
[drbd]
[85329.015094] [<ffffffffa0223b71>] ? inc_ap_bio+0x6a/0x12e [drbd]
[85329.051076] [<ffffffff8106389a>] ? autoremove_wake_function+0x0/0x2e
[85329.089657] [<ffffffffa0225bb0>] ? drbd_make_request_26+0x236/0x34e [drbd]
[85329.131365] [<ffffffff8103555f>] ? flush_tlb_page+0x5a/0x7b
[85329.165258] [<ffffffff8117c64b>] ? generic_make_request+0x299/0x2f9
[85329.203329] [<ffffffff8118fe6c>] ? radix_tree_delete+0xbf/0x1ba
[85329.239330] [<ffffffff8117c781>] ? submit_bio+0xd6/0xf2
[85329.271179] [<ffffffff8110b18b>] ? submit_bh+0xf5/0x115
[85329.302999] [<ffffffff8110d701>] ? __block_write_full_page+0x1d6/0x2ac
[85329.342623] [<ffffffff8110c486>] ? end_buffer_async_write+0x0/0x13b
[85329.380689] [<ffffffff8110f744>] ? blkdev_get_block+0x0/0x57
[85329.415112] [<ffffffff810bc92d>] ? shrink_page_list+0x375/0x623
[85329.451113] [<ffffffff810bd2fe>] ? shrink_list+0x44a/0x731
[85329.484484] [<ffffffff810b95b4>] ? get_dirty_limits+0x1d/0x259
[85329.519954] [<ffffffff81046cfc>] ? finish_task_switch+0x3a/0xaf
[85329.555938] [<ffffffff810bd865>] ? shrink_zone+0x280/0x342
[85329.589320] [<ffffffffa015c226>] ? mb_cache_shrink_fn+0x26/0x129 [mbcache]
[85329.631032] [<ffffffff810bda68>] ? shrink_slab+0x141/0x153
[85329.664409] [<ffffffff810be929>] ? try_to_free_pages+0x232/0x38e
[85329.700911] [<ffffffff810bb957>] ? isolate_pages_global+0x0/0x20f
[85329.737937] [<ffffffff810c6938>] ? congestion_wait+0x74/0x80
[85329.772380] [<ffffffff8106389a>] ? autoremove_wake_function+0x0/0x2e
[85329.810955] [<ffffffff810b8a03>] ? __alloc_pages_nodemask+0x3bb/0x5d0
[85329.850067] [<ffffffff810ba06d>] ? __do_page_cache_readahead+0x9b/0x1b4
[85329.890214] [<ffffffff810ba1a2>] ? ra_submit+0x1c/0x20
[85329.921502] [<ffffffff810b2f3a>] ? filemap_fault+0x17d/0x2f6
[85329.955941] [<ffffffff810c8cae>] ? __do_fault+0x54/0x3c3
[85329.988286] [<ffffffff810caf66>] ? handle_mm_fault+0x351/0x7a5
[85330.023738] [<ffffffff8106ad6d>] ? ktime_get_ts+0x68/0xb2
[85330.056624] [<ffffffff812fb286>] ? do_page_fault+0x2e0/0x2fc
[85330.091043] [<ffffffff812f9125>] ? page_fault+0x25/0x30
[85330.122857] INFO: task postgres:12592 blocked for more than 120 seconds.
[85330.163009] "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[85330.209962] postgres D ffff880005515780 0 12592 2958 0x00000000
[85330.251291] ffff88010a489530 0000000000000082 ffff88010cf19458
ffff88010cf19454
[85330.295909] ffffffff8144ac20 000000000000f9e0 ffff88010cf19fd8
0000000000015780
[85330.340577] 0000000000015780 ffff88010c0469f0 ffff88010c046ce8
000000010cf19448
[85330.385189] Call Trace:
[85330.399863] [<ffffffffa01c2519>] ? lc_get+0x4c/0x1de [lru_cache]
[85330.436371] [<ffffffffa022783d>] ? drbd_al_begin_io+0xd3/0x195 [drbd]
[85330.475475] [<ffffffff8110f58b>] ? bio_alloc_bioset+0x45/0xb7
[85330.510425] [<ffffffff8106389a>] ? autoremove_wake_function+0x0/0x2e
[85330.549008] [<ffffffffa0225004>] ? drbd_make_request_common+0x2dc/0xc52
[drbd]
[85330.592790] [<ffffffffa0223b71>] ? inc_ap_bio+0x6a/0x12e [drbd]
[85330.634906] [<ffffffff8106389a>] ? autoremove_wake_function+0x0/0x2e
[85330.673496] [<ffffffffa0225bb0>] ? drbd_make_request_26+0x236/0x34e [drbd]
[85330.715202] [<ffffffff8117c64b>] ? generic_make_request+0x299/0x2f9
[85330.753266] [<ffffffff8118fe6c>] ? radix_tree_delete+0xbf/0x1ba
[85330.789269] [<ffffffff810d2638>] ? page_referenced_one+0x8c/0x10d
[85330.826296] [<ffffffff8117c781>] ? submit_bio+0xd6/0xf2
[85330.858123] [<ffffffff8110b18b>] ? submit_bh+0xf5/0x115
[85330.889942] [<ffffffff8110d701>] ? __block_write_full_page+0x1d6/0x2ac
[85330.929569] [<ffffffff8110c486>] ? end_buffer_async_write+0x0/0x13b
[85330.967636] [<ffffffff8110f744>] ? blkdev_get_block+0x0/0x57
[85331.002055] [<ffffffff810bc92d>] ? shrink_page_list+0x375/0x623
[85331.038035] [<ffffffff8103ebd6>] ? update_curr+0xa6/0x147
[85331.070898] [<ffffffff810bd2fe>] ? shrink_list+0x44a/0x731
[85331.104281] [<ffffffff810b953c>] ? determine_dirtyable_memory+0xd/0x1d
[85331.143911] [<ffffffff810b95b4>] ? get_dirty_limits+0x1d/0x259
[85331.179373] [<ffffffff810592d8>] ? try_to_del_timer_sync+0x63/0x6c
[85331.216918] [<ffffffff810bd865>] ? shrink_zone+0x280/0x342
[85331.250298] [<ffffffffa015c226>] ? mb_cache_shrink_fn+0x26/0x129 [mbcache]
[85331.292000] [<ffffffff810bda68>] ? shrink_slab+0x141/0x153
[85331.325379] [<ffffffff810be929>] ? try_to_free_pages+0x232/0x38e
[85331.361886] [<ffffffff810bb957>] ? isolate_pages_global+0x0/0x20f
[85331.398909] [<ffffffff810c6938>] ? congestion_wait+0x74/0x80
[85331.433345] [<ffffffff8106389a>] ? autoremove_wake_function+0x0/0x2e
[85331.471929] [<ffffffff810b8a03>] ? __alloc_pages_nodemask+0x3bb/0x5d0
[85331.511035] [<ffffffff812f7b08>] ? io_schedule+0x93/0xb7
[85331.543384] [<ffffffff810ba06d>] ? __do_page_cache_readahead+0x9b/0x1b4
[85331.583531] [<ffffffff810638c8>] ? wake_bit_function+0x0/0x23
[85331.618469] [<ffffffff810ba1a2>] ? ra_submit+0x1c/0x20
[85331.649793] [<ffffffff810b2f3a>] ? filemap_fault+0x17d/0x2f6
[85331.684247] [<ffffffff810c8cae>] ? __do_fault+0x54/0x3c3
[85331.716594] [<ffffffff810caf66>] ? handle_mm_fault+0x351/0x7a5
[85331.752082] [<ffffffff812fb286>] ? do_page_fault+0x2e0/0x2fc
[85331.786542] [<ffffffff812f9125>] ? page_fault+0x25/0x30
[85331.818400] INFO: task apache2:2174 blocked for more than 120 seconds.
[85331.857551] "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[85331.904471] apache2 D ffff880005515780 0 2174 3161 0x00000000
[85331.945809] ffff88010cf4a350 0000000000000086 0000000000000000
ffffffff810b4254
[85331.990468] 00011200ffffffff 000000000000f9e0 ffff8800c49fdfd8
0000000000015780
[85332.035034] 0000000000015780 ffff880069902a60 ffff880069902d58
00000001810b4254
[85332.079599] Call Trace:
[85332.094271] [<ffffffff810b4254>] ? mempool_alloc+0x55/0x106
[85332.128177] [<ffffffffa01c2519>] ? lc_get+0x4c/0x1de [lru_cache]
[85332.164682] [<ffffffffa022783d>] ? drbd_al_begin_io+0xd3/0x195 [drbd]
[85332.203788] [<ffffffff8110f58b>] ? bio_alloc_bioset+0x45/0xb7
[85332.238741] [<ffffffff8106389a>] ? autoremove_wake_function+0x0/0x2e
[85332.277321] [<ffffffffa0225004>] ? drbd_make_request_common+0x2dc/0xc52
[drbd]
[85332.321119] [<ffffffffa0223b71>] ? inc_ap_bio+0x6a/0x12e [drbd]
[85332.357097] [<ffffffff8106389a>] ? autoremove_wake_function+0x0/0x2e
[85332.395683] [<ffffffffa0225bb0>] ? drbd_make_request_26+0x236/0x34e [drbd]
[85332.437377] [<ffffffff8103555f>] ? flush_tlb_page+0x5a/0x7b
[85332.471299] [<ffffffff8117c64b>] ? generic_make_request+0x299/0x2f9
[85332.509363] [<ffffffff8118feaf>] ? radix_tree_delete+0x102/0x1ba
[85332.545862] [<ffffffff8117c781>] ? submit_bio+0xd6/0xf2
[85332.577686] [<ffffffff8110b18b>] ? submit_bh+0xf5/0x115
[85332.609510] [<ffffffff8110d701>] ? __block_write_full_page+0x1d6/0x2ac
[85332.649152] [<ffffffff8110c486>] ? end_buffer_async_write+0x0/0x13b
[85332.687193] [<ffffffff8110f744>] ? blkdev_get_block+0x0/0x57
[85332.721633] [<ffffffff810bc92d>] ? shrink_page_list+0x375/0x623
[85332.757604] [<ffffffff810bd2fe>] ? shrink_list+0x44a/0x731
[85332.791068] [<ffffffff810bd865>] ? shrink_zone+0x280/0x342
[85332.824557] [<ffffffff810c6938>] ? congestion_wait+0x74/0x80
[85332.859099] [<ffffffff8106389a>] ? autoremove_wake_function+0x0/0x2e
[85332.897765] [<ffffffff810be929>] ? try_to_free_pages+0x232/0x38e
[85332.934415] [<ffffffff810bb957>] ? isolate_pages_global+0x0/0x20f
[85332.971471] [<ffffffff810b8a03>] ? __alloc_pages_nodemask+0x3bb/0x5d0
[85333.010691] [<ffffffff810ba06d>] ? __do_page_cache_readahead+0x9b/0x1b4
[85333.050909] [<ffffffff810ba1a2>] ? ra_submit+0x1c/0x20
[85333.082239] [<ffffffff810b2f3a>] ? filemap_fault+0x17d/0x2f6
[85333.116716] [<ffffffff810c8cae>] ? __do_fault+0x54/0x3c3
[85333.149102] [<ffffffff810caf66>] ? handle_mm_fault+0x351/0x7a5
[85333.184615] [<ffffffff81287718>] ? tcp_write_xmit+0x883/0x96c
[85333.219701] [<ffffffff8106ad6d>] ? ktime_get_ts+0x68/0xb2
[85333.252673] [<ffffffff812fb286>] ? do_page_fault+0x2e0/0x2fc
[85333.287174] [<ffffffff812f9125>] ? page_fault+0x25/0x30
[85333.319101] INFO: task apache2:2311 blocked for more than 120 seconds.
[85333.358301] "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[85333.405303] apache2 D ffff880005415780 0 2311 3161 0x00000000
[85333.446968] ffff8800ac920710 0000000000000086 0000000000000000
ffff8800998b2000
[85333.491588] 0000000000000010 000000000000f9e0 ffff8800998b3fd8
0000000000015780
[85333.536158] 0000000000015780 ffff88004e4de2e0 ffff88004e4de5d8
000000008118c2ae
[85333.580723] Call Trace:
[85333.595411] [<ffffffffa01c2519>] ? lc_get+0x4c/0x1de [lru_cache]
[85333.631902] [<ffffffffa022783d>] ? drbd_al_begin_io+0xd3/0x195 [drbd]
[85333.671002] [<ffffffff8110f58b>] ? bio_alloc_bioset+0x45/0xb7
[85333.705948] [<ffffffff8106389a>] ? autoremove_wake_function+0x0/0x2e
[85333.744534] [<ffffffffa0225004>] ? drbd_make_request_common+0x2dc/0xc52
[drbd]
[85333.788329] [<ffffffffa0223b71>] ? inc_ap_bio+0x6a/0x12e [drbd]
[85333.824310] [<ffffffff8106389a>] ? autoremove_wake_function+0x0/0x2e
[85333.862890] [<ffffffffa0260ab9>] ? ipt_do_table+0x5ee/0x621 [ip_tables]
[85333.903050] [<ffffffffa0225bb0>] ? drbd_make_request_26+0x236/0x34e [drbd]
[85333.944755] [<ffffffff8117c64b>] ? generic_make_request+0x299/0x2f9
[85333.982819] [<ffffffff8118feaf>] ? radix_tree_delete+0x102/0x1ba
[85334.019319] [<ffffffff8117c781>] ? submit_bio+0xd6/0xf2
[85334.051131] [<ffffffff8110b18b>] ? submit_bh+0xf5/0x115
[85334.082960] [<ffffffff8110d701>] ? __block_write_full_page+0x1d6/0x2ac
[85334.122585] [<ffffffff8110c486>] ? end_buffer_async_write+0x0/0x13b
[85334.160647] [<ffffffff8110f744>] ? blkdev_get_block+0x0/0x57
[85334.195071] [<ffffffff810bc92d>] ? shrink_page_list+0x375/0x623
[85334.231063] [<ffffffff810bd2fe>] ? shrink_list+0x44a/0x731
[85334.264439] [<ffffffff810b953c>] ? determine_dirtyable_memory+0xd/0x1d
[85334.304064] [<ffffffff810b95b4>] ? get_dirty_limits+0x1d/0x259
[85334.339532] [<ffffffff8105232e>] ? _local_bh_enable_ip+0x7d/0x8f
[85334.376048] [<ffffffff8127bd33>] ? tcp_recvmsg+0x98b/0xa9e
[85334.409419] [<ffffffff810bd865>] ? shrink_zone+0x280/0x342
[85334.442835] [<ffffffffa015c226>] ? mb_cache_shrink_fn+0x26/0x129 [mbcache]
[85334.484639] [<ffffffff810bda68>] ? shrink_slab+0x141/0x153
[85334.518003] [<ffffffff810be929>] ? try_to_free_pages+0x232/0x38e
[85334.554522] [<ffffffff810bb957>] ? isolate_pages_global+0x0/0x20f
[85334.591553] [<ffffffff810b8a03>] ? __alloc_pages_nodemask+0x3bb/0x5d0
[85334.630645] [<ffffffff8123df1d>] ? sockfd_lookup_light+0x1a/0x51
[85334.667155] [<ffffffff810cae28>] ? handle_mm_fault+0x213/0x7a5
[85334.702604] [<ffffffff810d00ce>] ? do_brk+0x227/0x307
[85334.733388] [<ffffffff812fb286>] ? do_page_fault+0x2e0/0x2fc
[85334.767817] [<ffffffff812f9125>] ? page_fault+0x25/0x30
[85334.799650] INFO: task apache2:2318 blocked for more than 120 seconds.
[85334.838751] "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[85334.885650] apache2 D ffff880005515780 0 2318 3161 0x00000008
[85334.927038] ffff880095c8d4c0 0000000000000086 ffff8800aa0955e8
ffff8800aa0955e4
[85334.971651] 00011200ffffffff 000000000000f9e0 ffff8800aa095fd8
0000000000015780
[85335.016273] 0000000000015780 ffff8800aa60e9f0 ffff8800aa60ece8
00000001810b4254
[85335.060992] Call Trace:
[85335.075680] [<ffffffffa01c2519>] ? lc_get+0x4c/0x1de [lru_cache]
[85335.112209] [<ffffffffa022783d>] ? drbd_al_begin_io+0xd3/0x195 [drbd]
[85335.151351] [<ffffffff8110f58b>] ? bio_alloc_bioset+0x45/0xb7
[85335.186337] [<ffffffff8106389a>] ? autoremove_wake_function+0x0/0x2e
[85335.224919] [<ffffffffa0225004>] ? drbd_make_request_common+0x2dc/0xc52
[drbd]
[85335.268757] [<ffffffffa0223b71>] ? inc_ap_bio+0x6a/0x12e [drbd]
[85335.304740] [<ffffffff8106389a>] ? autoremove_wake_function+0x0/0x2e
[85335.343329] [<ffffffff8118c2ae>] ? cpumask_next_and+0x2a/0x3a
[85335.378267] [<ffffffff810394f7>] ? scale_rt_power+0x1f/0x64
[85335.412174] [<ffffffffa0225bb0>] ? drbd_make_request_26+0x236/0x34e [drbd]
[85335.453885] [<ffffffff8117c64b>] ? generic_make_request+0x299/0x2f9
[85335.491935] [<ffffffff8118fe6c>] ? radix_tree_delete+0xbf/0x1ba
[85335.527929] [<ffffffff8117c781>] ? submit_bio+0xd6/0xf2
[85335.559746] [<ffffffff8110b18b>] ? submit_bh+0xf5/0x115
[85335.591580] [<ffffffff8110d701>] ? __block_write_full_page+0x1d6/0x2ac
[85335.631193] [<ffffffff8110c486>] ? end_buffer_async_write+0x0/0x13b
[85335.669253] [<ffffffff8110f744>] ? blkdev_get_block+0x0/0x57
[85335.703680] [<ffffffff810bc92d>] ? shrink_page_list+0x375/0x623
[85335.739663] [<ffffffff810bce8f>] ? shrink_active_list+0x2b4/0x2d9
[85335.776689] [<ffffffff810bd2fe>] ? shrink_list+0x44a/0x731
[85335.810071] [<ffffffff810b95b4>] ? get_dirty_limits+0x1d/0x259
[85335.845554] [<ffffffff810bd865>] ? shrink_zone+0x280/0x342
[85335.878936] [<ffffffff810c6938>] ? congestion_wait+0x74/0x80
[85335.913361] [<ffffffff8106389a>] ? autoremove_wake_function+0x0/0x2e
[85335.951940] [<ffffffff810be929>] ? try_to_free_pages+0x232/0x38e
[85335.988456] [<ffffffff810bb957>] ? isolate_pages_global+0x0/0x20f
[85336.025475] [<ffffffff810c6938>] ? congestion_wait+0x74/0x80
[85336.059889] [<ffffffff810b8a03>] ? __alloc_pages_nodemask+0x3bb/0x5d0
[85336.099005] [<ffffffff810baac1>] ? ____pagevec_lru_add+0x160/0x176
[85336.136549] [<ffffffff810cae28>] ? handle_mm_fault+0x213/0x7a5
[85336.172016] [<ffffffff81046cfc>] ? finish_task_switch+0x3a/0xaf
[85336.207999] [<ffffffff812fb286>] ? do_page_fault+0x2e0/0x2fc
[85336.242421] [<ffffffff812f9125>] ? page_fault+0x25/0x30
[85336.274263] INFO: task apache2:2711 blocked for more than 120 seconds.
[85336.313345] "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[85336.360250] apache2 D ffff880005415780 0 2711 3161 0x00000000
[85336.401637] ffff88000961e2e0 0000000000000086 0000000000000000
0000000000000246
[85336.446250] 00011200ffffffff 000000000000f9e0 ffff88011193ffd8
0000000000015780
[85336.490814] 0000000000015780 ffff88000961bf90 ffff88000961c288
00000000810e2aaf
[85336.535431] Call Trace:
[85336.556189] [<ffffffffa01c2519>] ? lc_get+0x4c/0x1de [lru_cache]
[85336.592697] [<ffffffffa022783d>] ? drbd_al_begin_io+0xd3/0x195 [drbd]
[85336.631797] [<ffffffff8110f58b>] ? bio_alloc_bioset+0x45/0xb7
[85336.666753] [<ffffffff8106389a>] ? autoremove_wake_function+0x0/0x2e
[85336.705341] [<ffffffffa0225004>] ? drbd_make_request_common+0x2dc/0xc52
[drbd]
[85336.749190] [<ffffffffa0223b71>] ? inc_ap_bio+0x6a/0x12e [drbd]
[85336.785159] [<ffffffff8106389a>] ? autoremove_wake_function+0x0/0x2e
[85336.823754] [<ffffffffa0225bb0>] ? drbd_make_request_26+0x236/0x34e [drbd]
[85336.865452] [<ffffffff8117c64b>] ? generic_make_request+0x299/0x2f9
[85336.903519] [<ffffffff8118fe6c>] ? radix_tree_delete+0xbf/0x1ba
[85336.939503] [<ffffffff8117c781>] ? submit_bio+0xd6/0xf2
[85336.971335] [<ffffffff8110b18b>] ? submit_bh+0xf5/0x115
[85337.003140] [<ffffffff8110d701>] ? __block_write_full_page+0x1d6/0x2ac
[85337.042800] [<ffffffff8110c486>] ? end_buffer_async_write+0x0/0x13b
[85337.080911] [<ffffffff8110f744>] ? blkdev_get_block+0x0/0x57
[85337.115362] [<ffffffff810bc92d>] ? shrink_page_list+0x375/0x623
[85337.151354] [<ffffffff810bce8f>] ? shrink_active_list+0x2b4/0x2d9
[85337.188381] [<ffffffff810bd2fe>] ? shrink_list+0x44a/0x731
[85337.221758] [<ffffffff810bd865>] ? shrink_zone+0x280/0x342
[85337.255140] [<ffffffffa015c226>] ? mb_cache_shrink_fn+0x26/0x129 [mbcache]
[85337.296874] [<ffffffff810bda68>] ? shrink_slab+0x141/0x153
[85337.330325] [<ffffffff810be929>] ? try_to_free_pages+0x232/0x38e
[85337.366851] [<ffffffff810bb957>] ? isolate_pages_global+0x0/0x20f
[85337.403878] [<ffffffff81046cfc>] ? finish_task_switch+0x3a/0xaf
[85337.439867] [<ffffffff810b8a03>] ? __alloc_pages_nodemask+0x3bb/0x5d0
[85337.478961] [<ffffffff8103ef85>] ? check_preempt_wakeup+0x1cd/0x268
[85337.517030] [<ffffffff810ba06d>] ? __do_page_cache_readahead+0x9b/0x1b4
[85337.557165] [<ffffffff810ba1a2>] ? ra_submit+0x1c/0x20
[85337.588466] [<ffffffff810b2f3a>] ? filemap_fault+0x17d/0x2f6
[85337.622930] [<ffffffff810c8cae>] ? __do_fault+0x54/0x3c3
[85337.655291] [<ffffffff8104088c>] ? pick_next_task_fair+0xcd/0xd8
[85337.691811] [<ffffffff8103f2bb>] ? set_next_entity+0x34/0x56
[85337.726229] [<ffffffff810caf66>] ? handle_mm_fault+0x351/0x7a5
[85337.761681] [<ffffffff812fb286>] ? do_page_fault+0x2e0/0x2fc
[85337.796124] [<ffffffff812f9125>] ? page_fault+0x25/0x30
[85337.827976] INFO: task apache2:4934 blocked for more than 120 seconds.
[85337.867100] "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[85337.914054] apache2 D ffff880005415780 0 4934 3161 0x00000000
[85337.955436] ffff88004e4de2e0 0000000000000082 0000000000000000
ffff88001043937c
[85338.000051] 0001120074736f70 000000000000f9e0 ffff880010439fd8
0000000000015780
[85338.044721] 0000000000015780 ffff88000961e2e0 ffff88000961e5d8
0000000010439370
[85338.089385] Call Trace:
[85338.104059] [<ffffffffa01c2519>] ? lc_get+0x4c/0x1de [lru_cache]
[85338.140567] [<ffffffffa022783d>] ? drbd_al_begin_io+0xd3/0x195 [drbd]
[85338.179666] [<ffffffff8110f58b>] ? bio_alloc_bioset+0x45/0xb7
[85338.214623] [<ffffffff8106389a>] ? autoremove_wake_function+0x0/0x2e
[85338.253221] [<ffffffffa0225004>] ? drbd_make_request_common+0x2dc/0xc52
[drbd]
[85338.297042] [<ffffffffa0223b71>] ? inc_ap_bio+0x6a/0x12e [drbd]
[85338.333018] [<ffffffff8106389a>] ? autoremove_wake_function+0x0/0x2e
[85338.371606] [<ffffffffa0225bb0>] ? drbd_make_request_26+0x236/0x34e [drbd]
[85338.413325] [<ffffffff8117c64b>] ? generic_make_request+0x299/0x2f9
[85338.451374] [<ffffffff8118feaf>] ? radix_tree_delete+0x102/0x1ba
[85338.487890] [<ffffffff8117c781>] ? submit_bio+0xd6/0xf2
[85338.519709] [<ffffffff8110b18b>] ? submit_bh+0xf5/0x115
[85338.551529] [<ffffffff8110d701>] ? __block_write_full_page+0x1d6/0x2ac
[85338.591154] [<ffffffff8110c486>] ? end_buffer_async_write+0x0/0x13b
[85338.629217] [<ffffffff8110f744>] ? blkdev_get_block+0x0/0x57
[85338.663643] [<ffffffff810bc92d>] ? shrink_page_list+0x375/0x623
[85338.699638] [<ffffffff810bd2fe>] ? shrink_list+0x44a/0x731
[85338.733025] [<ffffffff810b953c>] ? determine_dirtyable_memory+0xd/0x1d
[85338.772639] [<ffffffff810b95b4>] ? get_dirty_limits+0x1d/0x259
[85338.808113] [<ffffffff810bd865>] ? shrink_zone+0x280/0x342
[85338.841490] [<ffffffffa015c226>] ? mb_cache_shrink_fn+0x26/0x129 [mbcache]
[85338.883193] [<ffffffff810bda68>] ? shrink_slab+0x141/0x153
[85338.916598] [<ffffffff810be929>] ? try_to_free_pages+0x232/0x38e
[85338.953083] [<ffffffff810bb957>] ? isolate_pages_global+0x0/0x20f
[85338.990107] [<ffffffff810b8a03>] ? __alloc_pages_nodemask+0x3bb/0x5d0
[85339.029218] [<ffffffff81034c35>] ? pte_alloc_one+0xe/0x31
[85339.062080] [<ffffffff810cab65>] ? __pte_alloc+0x16/0xc6
[85339.094423] [<ffffffff810d7760>] ? __swap_duplicate+0x50/0x140
[85339.129904] [<ffffffff810cc8df>] ? copy_page_range+0x30d/0x711
[85339.165353] [<ffffffff8104ad59>] ? dup_mm+0x2c5/0x3f3
[85339.196136] [<ffffffff8104b8e2>] ? copy_process+0xa26/0x11ad
[85339.230560] [<ffffffff8104c1c0>] ? do_fork+0x157/0x31e
[85339.261866] [<ffffffff810ffe69>] ? alloc_fd+0x67/0x10c
[85339.293170] [<ffffffff810eb2f3>] ? fd_install+0x2e/0x5a
[85339.324994] [<ffffffff81010e63>] ? stub_clone+0x13/0x20
[85339.356822] [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
These hangs seemed to coincide with times when there was a large spike
in memory consumption and much of the server's physical memory was used
up, resulting in a significant increase in swapping. The server has 4GB
of RAM and 3GB of swap space. The most swap space I've seen in use was
0.9GB (during the periods of heavy memory consumption). However, I
wasn't able to measure actual swap usage during these freezes, so I
can't confirm this correlation or the exact swap usage during the freezes.
Some background information: the /usr, /var, /var/log, /home, and /srv
filesystems run off various DRBD devices, which use LVM logical volumes
as the underlying storage, which in turn uses two hard drives mirrored
using MD RAID1 as its physical volumes. The DRBD devices are configured
as Primary role, with the Secondary server being connected over a long
distance link. The root filesystem, /tmp, and swap bypass DRBD and use
LVM logical volume directly. These logical volumes reside on the same
physical volume as the logical volumes that back the DRBD devices
mentioned above.
I've been running this server under the same configuration for the past
two weeks with no problems - until yesterday.
This server is running Debian 5.0.5 with the Debian 2.6.32-bpo.5-amd64
kernel (supplied by the linux-image-2.6.32-bpo.5-amd64-2.6.32-20~bpo50+1
package).
Any ideas as to what might be the root cause behind this problem?
Please let me know if there is any additional information I should provide.
Thanks!
Alex
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user