Re: [Ocfs2-users] ocfs cluster node keeps rebooting

2013-02-22 Thread richard -rw- weinberger
On Mon, Jan 14, 2013 at 9:06 PM, Bill Zha lfl200...@yahoo.com wrote:
 *** ocfs2 is very sorry to be fencing this system by restarting ***

What ioscheduler are you using?
Consider switching to deadline.

-- 
Thanks,
//richard

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users


[Ocfs2-users] ocfs cluster node keeps rebooting

2013-01-14 Thread Bill Zha
Hi Sunil and All,

We have a 10 Redhat4.2-node OCFS cluster running on version 1.2.5-6.  One of 
the node started to rebooted almost everyday since last week.  The entire 
cluster had been stable for the past 1 year or so.  I captured the following 
console output, can you or someone had the similar issue let me know what the 
possible cause of these reboots?

(25271,4):o2net_idle_timer:1426 here are some times that might help debug the 
situation: (tmr 1358156758.101016 now 1358156788.97593 dr 1358156758.101008 adv 
1358156758.101022:1358156758.101024 func (5d21e188:507) 
1357953447.247097:1357953447.247100)
(25267,4):o2net_idle_timer:1426 here are some times that might help debug the 
situation: (tmr 1358156758.666788 now 1358156788.663604 dr 1358156760.666794 
adv 1358156758.666793:1358156758.666795 func (5d21e188:505) 
1357953453.107343:1357953453.107349)
(25267,4):o2net_idle_timer:1426 here are some times that might help debug the 
situation: (tmr 1358156758.848933 now 1358156788.953367 dr 1358156760.847939 
adv 1358156758.848939:1358156758.848941 func (0e6eb1eb:505) 
1357965605.352156:1357965605.352162)
(25267,4):o2net_idle_timer:1426 here are some times that might help debug the 
situation: (tmr 1358156759.108373 now 1358156789.243003 dr 1358156761.108392 
adv 1358156759.108376:1358156759.108378 func (af22ae1f:502) 
1357914301.741127:1357914301.741130)
(25275,4):o2net_idle_timer:1426 here are some times that might help debug the 
situation: (tmr 1358156759.626366 now 1358156789.623629 dr 1358156789.622319 
adv 1358156759.626369:1358156759.626371 func (abd851aa:505) 
1357965605.363679:1357965605.363685)
(25275,4):o2net_idle_timer:1426 here are some times that might help debug the 
situation: (tmr 1358156759.656350 now 1358156789.913330 dr 1358156761.656039 
adv 1358156759.656354:1358156759.656355 func (0e6eb1eb:502) 
1357907401.318584:1357907401.318587)
(25275,4):o2net_idle_timer:1426 here are some times that might help debug the 
situation: (tmr 1358156759.663467 now 1358156790.203323 dr 1358156761.662745 
adv 1358156759.663470:1358156759.663472 func (7dcded64:502) 
1357875986.764566:1357875986.764568)
(25275,4):o2net_idle_timer:1426 here are some times that might help debug the 
situation: (tmr 1358156759.987324 now 1358156790.493342 dr 1358156761.987117 
adv 1358156759.987327:1358156759.987329 func (6bcd2bc6:502) 
1357875995.47:1357875995.55)
(25,7):o2hb_write_timeout:269 ERROR: Heartbeat write timeout to device dm-14 
after 18 milliseconds
Heartbeat thread (25) printing last 24 blocking operations (cur = 11):
Heartbeat thread stuck at msleep, stuffing current time into that blocker 
(index 11)
Index 12: took 0 ms to do allocating bios for read
Index 13: took 0 ms to do bio alloc read
Index 14: took 0 ms to do bio add page read
Index 15: took 0 ms to do bio add page read
Index 16: took 0 ms to do submit_bio for read
Index 17: took 0 ms to do waiting for read completion
Index 18: took 0 ms to do bio alloc write
Index 19: took 0 ms to do bio add page write
Index 20: took 0 ms to do submit_bio for write
Index 21: took 0 ms to do checking slots
Index 22: took 0 ms to do waiting for write completion
Index 23: took 100897 ms to do msleep
Index 0: took 0 ms to do allocating bios for read
Index 1: took 0 ms to do bio alloc read
Index 2: took 0 ms to do bio add page read
Index 3: took 0 ms to do bio add page read
Index 4: took 0 ms to do submit_bio for read
Index 5: took 0 ms to do waiting for read completion
Index 6: took 0 ms to do bio alloc write
Index 7: took 0 ms to do bio add page write
Index 8: took 0 ms to do submit_bio for write
Index 9: took 0 ms to do checking slots
Index 10: took 0 ms to do waiting for write completion
Index 11: took 313 ms to do msleep
*** ocfs2 is very sorry to be fencing this system by restarting ***


Thank you so much for your help!


Bill

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] ocfs cluster node keeps rebooting

2013-01-14 Thread Sunil Mushran
1.2.5 is 6+ year old release. You may want to use something more current.


On Mon, Jan 14, 2013 at 12:06 PM, Bill Zha lfl200...@yahoo.com wrote:

 Hi Sunil and All,

 We have a 10 Redhat4.2-node OCFS cluster running on version 1.2.5-6.  One
 of the node started to rebooted almost everyday since last week.  The
 entire cluster had been stable for the past 1 year or so.  I captured the
 following console output, can you or someone had the similar issue let me
 know what the possible cause of these reboots?

 (25271,4):o2net_idle_timer:1426 here are some times that might help debug
 the situation: (tmr 1358156758.101016 now 1358156788.97593 dr
 1358156758.101008 adv 1358156758.101022:1358156758.101024 func
 (5d21e188:507) 1357953447.247097:1357953447.247100)
 (25267,4):o2net_idle_timer:1426 here are some times that might help debug
 the situation: (tmr 1358156758.666788 now 1358156788.663604 dr
 1358156760.666794 adv 1358156758.666793:1358156758.666795 func
 (5d21e188:505) 1357953453.107343:1357953453.107349)
 (25267,4):o2net_idle_timer:1426 here are some times that might help debug
 the situation: (tmr 1358156758.848933 now 1358156788.953367 dr
 1358156760.847939 adv 1358156758.848939:1358156758.848941 func
 (0e6eb1eb:505) 1357965605.352156:1357965605.352162)
 (25267,4):o2net_idle_timer:1426 here are some times that might help debug
 the situation: (tmr 1358156759.108373 now 1358156789.243003 dr
 1358156761.108392 adv 1358156759.108376:1358156759.108378 func
 (af22ae1f:502) 1357914301.741127:1357914301.741130)
 (25275,4):o2net_idle_timer:1426 here are some times that might help debug
 the situation: (tmr 1358156759.626366 now 1358156789.623629 dr
 1358156789.622319 adv 1358156759.626369:1358156759.626371 func
 (abd851aa:505) 1357965605.363679:1357965605.363685)
 (25275,4):o2net_idle_timer:1426 here are some times that might help debug
 the situation: (tmr 1358156759.656350 now 1358156789.913330 dr
 1358156761.656039 adv 1358156759.656354:1358156759.656355 func
 (0e6eb1eb:502) 1357907401.318584:1357907401.318587)
 (25275,4):o2net_idle_timer:1426 here are some times that might help debug
 the situation: (tmr 1358156759.663467 now 1358156790.203323 dr
 1358156761.662745 adv 1358156759.663470:1358156759.663472 func
 (7dcded64:502) 1357875986.764566:1357875986.764568)
 (25275,4):o2net_idle_timer:1426 here are some times that might help debug
 the situation: (tmr 1358156759.987324 now 1358156790.493342 dr
 1358156761.987117 adv 1358156759.987327:1358156759.987329 func
 (6bcd2bc6:502) 1357875995.47:1357875995.55)
 (25,7):o2hb_write_timeout:269 ERROR: Heartbeat write timeout to device
 dm-14 after 18 milliseconds
 Heartbeat thread (25) printing last 24 blocking operations (cur = 11):
 Heartbeat thread stuck at msleep, stuffing current time into that blocker
 (index 11)
 Index 12: took 0 ms to do allocating bios for read
 Index 13: took 0 ms to do bio alloc read
 Index 14: took 0 ms to do bio add page read
 Index 15: took 0 ms to do bio add page read
 Index 16: took 0 ms to do submit_bio for read
 Index 17: took 0 ms to do waiting for read completion
 Index 18: took 0 ms to do bio alloc write
 Index 19: took 0 ms to do bio add page write
 Index 20: took 0 ms to do submit_bio for write
 Index 21: took 0 ms to do checking slots
 Index 22: took 0 ms to do waiting for write completion
 Index 23: took 100897 ms to do msleep
 Index 0: took 0 ms to do allocating bios for read
 Index 1: took 0 ms to do bio alloc read
 Index 2: took 0 ms to do bio add page read
 Index 3: took 0 ms to do bio add page read
 Index 4: took 0 ms to do submit_bio for read
 Index 5: took 0 ms to do waiting for read completion
 Index 6: took 0 ms to do bio alloc write
 Index 7: took 0 ms to do bio add page write
 Index 8: took 0 ms to do submit_bio for write
 Index 9: took 0 ms to do checking slots
 Index 10: took 0 ms to do waiting for write completion
 Index 11: took 313 ms to do msleep
 *** ocfs2 is very sorry to be fencing this system by restarting ***


 Thank you so much for your help!


 Bill

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users