Same problem here, in a webserver cluster httpd run into D state sometimes. I have to restart the node or even the whole cluster if there are more than one node locked. I'm using REDHAT 5.4 and HP hardware.
Regards, 2011/1/4 Paras pradhan <[email protected]> > I had the same problem. it locked the whole gfs cluster and had to > reboot the node. after reboot all is fine now but still trying to find > out what has caused it. > > Paras > > On Monday, January 3, 2011, InterNetworX | Hostmaster > <[email protected]> wrote: > > Hello, > > > > we are using GFS2 but sometimes there are processes hanging in D state: > > > > # ps axl | grep D > > F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME > COMMAND > > 0 0 14220 14219 20 0 19624 1916 - Ds ? 0:00 > > /usr/lib/postfix/master -t > > 0 0 14555 14498 20 0 16608 1716 - D+ > > /mnt/storage/openvz/root/129/dev/pts/0 0:00 apt-get install less > > 0 0 15068 15067 19 -1 36844 2156 - D<s ? 0:00 > > /usr/lib/postfix/master -t > > 0 0 16603 16602 19 -1 36844 2156 - D<s ? 0:00 > > /usr/lib/postfix/master -t > > 4 101 19534 13238 19 -1 33132 2984 - D< ? 0:00 > > smtpd -n smtp -t inet -u -c > > 4 101 19542 13238 19 -1 33116 2976 - D< ? 0:00 > > smtpd -n smtp -t inet -u -c > > 0 0 19735 13068 20 0 7548 880 - S+ pts/0 0:00 grep > D > > > > dmesg shows this message many times: > > > > [11142.334229] INFO: task master:14220 blocked for more than 120 seconds. > > [11142.334266] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > disables this message. > > [11142.334310] master D ffff88032b644800 0 14220 14219 > > 0x00000000 > > [11142.334315] ffff88062dd40000 0000000000000086 0000000000000000 > > ffffffffa02628d9 > > [11142.334318] ffff88017a517ef8 000000000000fa40 ffff88017a517fd8 > > 0000000000016940 > > [11142.334322] 0000000000016940 ffff88032b644800 ffff88032b644af8 > > 0000000b7a517cd8 > > [11142.334325] Call Trace: > > [11142.334340] [<ffffffffa02628d9>] ? gfs2_glock_put+0xf9/0x118 [gfs2] > > [11142.334347] [<ffffffffa0261db0>] ? gfs2_glock_holder_wait+0x0/0xd > [gfs2] > > [11142.334353] [<ffffffffa0261db9>] ? gfs2_glock_holder_wait+0x9/0xd > [gfs2] > > [11142.334358] [<ffffffff812e9897>] ? __wait_on_bit+0x41/0x70 > > [11142.334363] [<ffffffffa0261db0>] ? gfs2_glock_holder_wait+0x0/0xd > [gfs2] > > [11142.334367] [<ffffffff812e9931>] ? out_of_line_wait_on_bit+0x6b/0x77 > > [11142.334370] [<ffffffff81066808>] ? wake_bit_function+0x0/0x23 > > [11142.334376] [<ffffffffa0261d9e>] ? gfs2_glock_wait+0x23/0x28 [gfs2] > > [11142.334383] [<ffffffffa026b2b0>] ? gfs2_flock+0x17c/0x1f9 [gfs2] > > [11142.334386] [<ffffffff810e735d>] ? virt_to_head_page+0x9/0x2a > > [11142.334389] [<ffffffff810e743e>] ? ub_slab_ptr+0x22/0x65 > > [11142.334393] [<ffffffff8112221b>] ? sys_flock+0xff/0x12a > > [11142.334396] [<ffffffff81010c12>] ? system_call_fastpath+0x16/0x1b > > > > Any idea what is going wrong? Do you need any more informations? > > > > Mario > > > > -- > > Linux-cluster mailing list > > [email protected] > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > Linux-cluster mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/linux-cluster > -- ******************************************* Emilio Arjona Heredia Centro de EnseƱanzas Virtuales de la Universidad de Granada C/ Real de Cartuja 36-38 http://cevug.ugr.es Tlfno.: 958-241000 ext. 20206 *******************************************
-- Linux-cluster mailing list [email protected] https://www.redhat.com/mailman/listinfo/linux-cluster
