Re: [Ocfs2-users] OCFS2 tuning, fragmentation and localalloc option. Cluster hanging during mix read+write workloads
Hi Goldwyn, Apologies for the delayed reply. The hung Apache process / OCFS issue cropped up again, so I thought I'd pass along the contents of /proc/pid/stack of a few affected processes: gjones@slipapp02:~ sudo cat /proc/27521/stack gjones's password: [811663b4] poll_schedule_timeout+0x44/0x60 [81166d56] do_select+0x5a6/0x670 [81166fbe] core_sys_select+0x19e/0x2d0 [811671a5] sys_select+0xb5/0x110 [815429bd] system_call_fastpath+0x1a/0x1f [7f394bdd5f23] 0x7f394bdd5f23 [] 0x gjones@slipapp02:~ sudo cat /proc/27530/stack [81249721] sys_semtimedop+0x5a1/0x8b0 [815429bd] system_call_fastpath+0x1a/0x1f [7f394bdddb77] 0x7f394bdddb77 [] 0x gjones@slipapp02:~ sudo cat /proc/27462/stack [81249721] sys_semtimedop+0x5a1/0x8b0 [815429bd] system_call_fastpath+0x1a/0x1f [7f394bdddb77] 0x7f394bdddb77 [] 0x gjones@slipapp02:~ sudo cat /proc/27526/stack [81249721] sys_semtimedop+0x5a1/0x8b0 [815429bd] system_call_fastpath+0x1a/0x1f [7f394bdddb77] 0x7f394bdddb77 [] 0x Additionally, in dmesg I see, for example, [774981.361149] (/usr/sbin/httpd,8266,3):ocfs2_unlink:951 ERROR: status = -2 [775896.135467] (/usr/sbin/httpd,8435,3):ocfs2_check_dir_for_entry:2119 ERROR: status = -17 [775896.135474] (/usr/sbin/httpd,8435,3):ocfs2_mknod:459 ERROR: status = -17 [775896.135477] (/usr/sbin/httpd,8435,3):ocfs2_create:629 ERROR: status = -17 [788406.624126] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4491991450, last ping 4491992701, now 4491993952 [788406.624138] connection1:0: detected conn error (1011) [788406.640132] connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4491991451, last ping 4491992702, now 4491993956 [788406.640142] connection2:0: detected conn error (1011) [788406.928134] connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4491991524, last ping 4491992775, now 4491994028 [788406.928150] connection4:0: detected conn error (1011) [788406.944147] connection5:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4491991528, last ping 4491992779, now 4491994032 [788406.944165] connection5:0: detected conn error (1011) [788408.640123] connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4491991954, last ping 4491993205, now 4491994456 [788408.640134] connection3:0: detected conn error (1011) [788409.907968] connection1:0: detected conn error (1020) [788409.908280] connection2:0: detected conn error (1020) [788409.912683] connection4:0: detected conn error (1020) [788409.913152] connection5:0: detected conn error (1020) [788411.491818] connection3:0: detected conn error (1020) that repeats for a bit and then I see [1952161.012214] INFO: task /usr/sbin/httpd:27491 blocked for more than 480 seconds. [1952161.012219] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [1952161.012221] /usr/sbin/httpd D 88081fc52b40 0 27491 27449 0x [1952161.012226] 88031a85dc50 0082 880532a92640 88031a85dfd8 [1952161.012231] 88031a85dfd8 88031a85dfd8 8807f791c300 880532a92640 [1952161.012235] 8115f3ae 8802804bdd98 880532a92640 [1952161.012239] Call Trace: [1952161.012251] [81538fea] __mutex_lock_slowpath+0xca/0x140 [1952161.012257] [81538b0a] mutex_lock+0x1a/0x40 [1952161.012262] [81160e80] do_lookup+0x290/0x340 [1952161.012269] [81161c7f] path_lookupat+0x10f/0x700 [1952161.012274] [8116229c] do_path_lookup+0x2c/0xc0 [1952161.012279] [8116372d] user_path_at_empty+0x5d/0xb0 [1952161.012283] [81158d9d] vfs_fstatat+0x2d/0x70 [1952161.012288] [81158fe2] sys_newstat+0x12/0x30 [1952161.012293] [815429bd] system_call_fastpath+0x1a/0x1f [1952161.012308] [7f394bdcfb05] 0x7f394bdcfb04 [1952161.012382] INFO: task /usr/sbin/httpd:27560 blocked for more than 480 seconds. [1952161.012384] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [1952161.012385] /usr/sbin/httpd D 88081fd52b40 0 27560 27449 0x [1952161.012389] 880224023c50 0086 88024326e580 880224023fd8 [1952161.012393] 880224023fd8 880224023fd8 8807f79cc800 88024326e580 [1952161.012397] 8115f3ae 8802804bdd98 88024326e580 [1952161.012401] Call Trace: [1952161.012406] [81538fea] __mutex_lock_slowpath+0xca/0x140 [1952161.012410] [81538b0a] mutex_lock+0x1a/0x40 [1952161.012415] [81160e80] do_lookup+0x290/0x340 [1952161.012420] [81161c7f] path_lookupat+0x10f/0x700 [1952161.012425] [8116229c] do_path_lookup+0x2c/0xc0 [1952161.012430] [8116372d] user_path_at_empty+0x5d/0xb0 [1952161.012434] [81158d9d] vfs_fstatat+0x2d/0x70 [1952161.012438]
Re: [Ocfs2-users] OCFS2 tuning, fragmentation and localalloc option. Cluster hanging during mix read+write workloads
Hello Goldwyn, Thanks for taking a look at this. So, then, it does seem to be DLM related. We were running fine for a few weeks and then it came up again this morning and has been going on throughout the day. Regarding the DLM debugging, I allowed debugging for DLM_GLUE, DLM_THREAD, DLM_MASTER and DLM_RECOVERY. However, I don't see any DLM logging output in dmesg or syslog --is there perhaps another way to get at the actual DLM log? I've searched around a bit but didn't find anything that made it clear. As for OCFS2 and iSCSI communications, they use the same physical network interface but different VLANs on that interface. The connectionX:0 errors, then, seem to indicate an issue with the ISCSI connection. The system logs and monitoring software don't show any warnings or errors about the interface going down, so the only thing I can think of is the connection load balancing on the SAN, though that's merely a hunch. Maybe I should mail the list and see if anyone has a similar setup. If you could please point me in the right direction to make use of the DLM debugging via debugs.ocfs2, I would appreciate it. Thanks again, Gavin W. Jones Where 2 Get It, Inc. On Tue, Aug 6, 2013 at 4:16 PM, Goldwyn Rodrigues rgold...@suse.de wrote: Hi Gavin, On 08/06/2013 01:59 PM, Gavin Jones wrote: Hi Goldwyn, Apologies for the delayed reply. The hung Apache process / OCFS issue cropped up again, so I thought I'd pass along the contents of /proc/pid/stack of a few affected processes: gjones@slipapp02:~ sudo cat /proc/27521/stack gjones's password: [811663b4] poll_schedule_timeout+0x44/0x60 [81166d56] do_select+0x5a6/0x670 [81166fbe] core_sys_select+0x19e/0x2d0 [811671a5] sys_select+0xb5/0x110 [815429bd] system_call_fastpath+0x1a/0x1f [7f394bdd5f23] 0x7f394bdd5f23 [] 0x gjones@slipapp02:~ sudo cat /proc/27530/stack [81249721] sys_semtimedop+0x5a1/0x8b0 [815429bd] system_call_fastpath+0x1a/0x1f [7f394bdddb77] 0x7f394bdddb77 [] 0x gjones@slipapp02:~ sudo cat /proc/27462/stack [81249721] sys_semtimedop+0x5a1/0x8b0 [815429bd] system_call_fastpath+0x1a/0x1f [7f394bdddb77] 0x7f394bdddb77 [] 0x gjones@slipapp02:~ sudo cat /proc/27526/stack [81249721] sys_semtimedop+0x5a1/0x8b0 [815429bd] system_call_fastpath+0x1a/0x1f [7f394bdddb77] 0x7f394bdddb77 [] 0x Additionally, in dmesg I see, for example, [774981.361149] (/usr/sbin/httpd,8266,3):ocfs2_unlink:951 ERROR: status = -2 [775896.135467] (/usr/sbin/httpd,8435,3):ocfs2_check_dir_for_entry:2119 ERROR: status = -17 [775896.135474] (/usr/sbin/httpd,8435,3):ocfs2_mknod:459 ERROR: status = -17 [775896.135477] (/usr/sbin/httpd,8435,3):ocfs2_create:629 ERROR: status = -17 [788406.624126] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4491991450, last ping 4491992701, now 4491993952 [788406.624138] connection1:0: detected conn error (1011) [788406.640132] connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4491991451, last ping 4491992702, now 4491993956 [788406.640142] connection2:0: detected conn error (1011) [788406.928134] connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4491991524, last ping 4491992775, now 4491994028 [788406.928150] connection4:0: detected conn error (1011) [788406.944147] connection5:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4491991528, last ping 4491992779, now 4491994032 [788406.944165] connection5:0: detected conn error (1011) [788408.640123] connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4491991954, last ping 4491993205, now 4491994456 [788408.640134] connection3:0: detected conn error (1011) [788409.907968] connection1:0: detected conn error (1020) [788409.908280] connection2:0: detected conn error (1020) [788409.912683] connection4:0: detected conn error (1020) [788409.913152] connection5:0: detected conn error (1020) [788411.491818] connection3:0: detected conn error (1020) that repeats for a bit and then I see [1952161.012214] INFO: task /usr/sbin/httpd:27491 blocked for more than 480 seconds. [1952161.012219] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [1952161.012221] /usr/sbin/httpd D 88081fc52b40 0 27491 27449 0x [1952161.012226] 88031a85dc50 0082 880532a92640 88031a85dfd8 [1952161.012231] 88031a85dfd8 88031a85dfd8 8807f791c300 880532a92640 [1952161.012235] 8115f3ae 8802804bdd98 880532a92640 [1952161.012239] Call Trace: [1952161.012251] [81538fea] __mutex_lock_slowpath+0xca/0x140 [1952161.012257] [81538b0a] mutex_lock+0x1a/0x40 [1952161.012262] [81160e80] do_lookup+0x290/0x340
Re: [Ocfs2-users] OCFS2 tuning, fragmentation and localalloc option. Cluster hanging during mix read+write workloads
If the storage connectivity is not stable, then dlm issues are to be expected. In this case, the processes are all trying to take the readlock. One possible scenario is that the node holding the writelock is not able to relinquish the lock because it cannot flush the updated inodes to disk. I would suggest you look into load balancing and how it affects the iscsi connectivity from the hosts. On Tue, Aug 6, 2013 at 2:51 PM, Gavin Jones gjo...@where2getit.com wrote: Hello Goldwyn, Thanks for taking a look at this. So, then, it does seem to be DLM related. We were running fine for a few weeks and then it came up again this morning and has been going on throughout the day. Regarding the DLM debugging, I allowed debugging for DLM_GLUE, DLM_THREAD, DLM_MASTER and DLM_RECOVERY. However, I don't see any DLM logging output in dmesg or syslog --is there perhaps another way to get at the actual DLM log? I've searched around a bit but didn't find anything that made it clear. As for OCFS2 and iSCSI communications, they use the same physical network interface but different VLANs on that interface. The connectionX:0 errors, then, seem to indicate an issue with the ISCSI connection. The system logs and monitoring software don't show any warnings or errors about the interface going down, so the only thing I can think of is the connection load balancing on the SAN, though that's merely a hunch. Maybe I should mail the list and see if anyone has a similar setup. If you could please point me in the right direction to make use of the DLM debugging via debugs.ocfs2, I would appreciate it. Thanks again, Gavin W. Jones Where 2 Get It, Inc. On Tue, Aug 6, 2013 at 4:16 PM, Goldwyn Rodrigues rgold...@suse.de wrote: Hi Gavin, On 08/06/2013 01:59 PM, Gavin Jones wrote: Hi Goldwyn, Apologies for the delayed reply. The hung Apache process / OCFS issue cropped up again, so I thought I'd pass along the contents of /proc/pid/stack of a few affected processes: gjones@slipapp02:~ sudo cat /proc/27521/stack gjones's password: [811663b4] poll_schedule_timeout+0x44/0x60 [81166d56] do_select+0x5a6/0x670 [81166fbe] core_sys_select+0x19e/0x2d0 [811671a5] sys_select+0xb5/0x110 [815429bd] system_call_fastpath+0x1a/0x1f [7f394bdd5f23] 0x7f394bdd5f23 [] 0x gjones@slipapp02:~ sudo cat /proc/27530/stack [81249721] sys_semtimedop+0x5a1/0x8b0 [815429bd] system_call_fastpath+0x1a/0x1f [7f394bdddb77] 0x7f394bdddb77 [] 0x gjones@slipapp02:~ sudo cat /proc/27462/stack [81249721] sys_semtimedop+0x5a1/0x8b0 [815429bd] system_call_fastpath+0x1a/0x1f [7f394bdddb77] 0x7f394bdddb77 [] 0x gjones@slipapp02:~ sudo cat /proc/27526/stack [81249721] sys_semtimedop+0x5a1/0x8b0 [815429bd] system_call_fastpath+0x1a/0x1f [7f394bdddb77] 0x7f394bdddb77 [] 0x Additionally, in dmesg I see, for example, [774981.361149] (/usr/sbin/httpd,8266,3):ocfs2_unlink:951 ERROR: status = -2 [775896.135467] (/usr/sbin/httpd,8435,3):ocfs2_check_dir_for_entry:2119 ERROR: status = -17 [775896.135474] (/usr/sbin/httpd,8435,3):ocfs2_mknod:459 ERROR: status = -17 [775896.135477] (/usr/sbin/httpd,8435,3):ocfs2_create:629 ERROR: status = -17 [788406.624126] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4491991450, last ping 4491992701, now 4491993952 [788406.624138] connection1:0: detected conn error (1011) [788406.640132] connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4491991451, last ping 4491992702, now 4491993956 [788406.640142] connection2:0: detected conn error (1011) [788406.928134] connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4491991524, last ping 4491992775, now 4491994028 [788406.928150] connection4:0: detected conn error (1011) [788406.944147] connection5:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4491991528, last ping 4491992779, now 4491994032 [788406.944165] connection5:0: detected conn error (1011) [788408.640123] connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4491991954, last ping 4491993205, now 4491994456 [788408.640134] connection3:0: detected conn error (1011) [788409.907968] connection1:0: detected conn error (1020) [788409.908280] connection2:0: detected conn error (1020) [788409.912683] connection4:0: detected conn error (1020) [788409.913152] connection5:0: detected conn error (1020) [788411.491818] connection3:0: detected conn error (1020) that repeats for a bit and then I see [1952161.012214] INFO: task /usr/sbin/httpd:27491 blocked for more than 480 seconds. [1952161.012219] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this
Re: [Ocfs2-users] OCFS2 tuning, fragmentation and localalloc option. Cluster hanging during mix read+write workloads
Hi Gavin, On 08/06/2013 04:51 PM, Gavin Jones wrote: Hello Goldwyn, Thanks for taking a look at this. So, then, it does seem to be DLM related. We were running fine for a few weeks and then it came up again this morning and has been going on throughout the day. Regarding the DLM debugging, I allowed debugging for DLM_GLUE, DLM_THREAD, DLM_MASTER and DLM_RECOVERY. However, I don't see any DLM logging output in dmesg or syslog --is there perhaps another way to get at the actual DLM log? I've searched around a bit but didn't find anything that made it clear. Unfortunately CONFIG_OCFS2_DEBUG_MASKLOG is not enabled for opensuse kernels but for SLES kernels only. Sorry about that :( However, you can recompile the kernel with this enabled in the config file. As for OCFS2 and iSCSI communications, they use the same physical network interface but different VLANs on that interface. The connectionX:0 errors, then, seem to indicate an issue with the ISCSI connection. The system logs and monitoring software don't show any warnings or errors about the interface going down, so the only thing I can think of is the connection load balancing on the SAN, though that's merely a hunch. Maybe I should mail the list and see if anyone has a similar setup. You will not have anything in the logs if the network issues are intermittent. Perhaps a simple ping when the issue is occurring is the best tool. My doubts on network issues keep getting stronger by the information you have given me so far. Also, as Sunil mentioned, you have a problem if the storage does not respond anyways. If you could please point me in the right direction to make use of the DLM debugging via debugs.ocfs2, I would appreciate it. snipped -- Goldwyn ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] OCFS2 tuning, fragmentation and localalloc option. Cluster hanging during mix read+write workloads
Hello, Sure, I'd be happy to provide such information next time this occurs. Can you elaborate, or point me at documentation / procedure regarding the DLM debug logs and what would be helpful to see? I have read Troubleshooting OCFS2 [1] and the section Debugging File System Locks --is this what you're referring to? Not sure if this will provide additional context or just muddy the waters, but I thought to provide some syslog messages from an affected server the last time this occurred. Jul 14 15:36:55 slipapp07 kernel: [2173588.704093] o2net: Connection to node slipapp03 (num 2) at 172.16.40.122: has been idle for 30.97 secs, shutting it down. Jul 14 15:36:55 slipapp07 kernel: [2173588.704146] o2net: No longer connected to node slipapp03 (num 2) at 172.16.40.122: Jul 14 15:36:55 slipapp07 kernel: [2173588.704279] (kworker/u:1,12787,4):dlm_do_assert_master:1665 ERROR: Error -112 when sending message 502 (key 0xdc8be796) to node 2 Jul 14 15:36:55 slipapp07 kernel: [2173588.704295] (kworker/u:5,26056,5):dlm_do_master_request:1332 ERROR: link to 2 went down! Jul 14 15:36:55 slipapp07 kernel: [2173588.704301] (kworker/u:5,26056,5):dlm_get_lock_resource:917 ERROR: status = -112 Jul 14 15:37:25 slipapp07 kernel: [2173618.784153] o2net: No connection established with node 2 after 30.0 seconds, giving up. snip Jul 14 15:39:14 slipapp07 kernel: [2173727.920793] (kworker/u:2,13894,1):dlm_do_assert_master:1665 ERROR: Error -112 when sending message 502 (key 0xdc8be796) to node 4 Jul 14 15:39:14 slipapp07 kernel: [2173727.920833] (/usr/sbin/httpd,5023,5):dlm_send_remote_lock_request:336 ERROR: A08674A831ED4048B5136BD8613B21E0: res N0152a8da, Error -112 send CREATE LOCK to node 4 Jul 14 15:39:14 slipapp07 kernel: [2173727.930562] (kworker/u:2,13894,1):dlm_do_assert_master:1665 ERROR: Error -107 when sending message 502 (key 0xdc8be796) to node 4 Jul 14 15:39:14 slipapp07 kernel: [2173727.944998] (kworker/u:2,13894,1):dlm_do_assert_master:1665 ERROR: Error -107 when sending message 502 (key 0xdc8be796) to node 4 Jul 14 15:39:14 slipapp07 kernel: [2173727.951511] (kworker/u:2,13894,1):dlm_do_assert_master:1665 ERROR: Error -107 when sending message 502 (key 0xdc8be796) to node 4 Jul 14 15:39:14 slipapp07 kernel: [2173727.973848] (kworker/u:2,13894,1):dlm_do_assert_master:1665 ERROR: Error -107 when sending message 502 (key 0xdc8be796) to node 4 Jul 14 15:39:14 slipapp07 kernel: [2173727.990216] (kworker/u:2,13894,7):dlm_do_assert_master:1665 ERROR: Error -107 when sending message 502 (key 0xdc8be796) to node 4 Jul 14 15:39:14 slipapp07 kernel: [2173728.024139] (/usr/sbin/httpd,5023,5):dlm_send_remote_lock_request:336 ERROR: A08674A831ED4048B5136BD8613B21E0: res N0152a8da, Error -107 send CREATE LOCK to node 4 snip, many, many more like the above Which I suppose would indicate DLM issues; I have previously tried to investigate this (via abovementioned guide) but was unable to make real headway. I apologize for the rather basic questions... Thanks, Gavin W. Jones Where 2 Get It, Inc. [1]: http://docs.oracle.com/cd/E37670_01/E37355/html/ol_tshoot_ocfs2.html On Wed, Jul 17, 2013 at 7:07 AM, Goldwyn Rodrigues rgold...@suse.de wrote: Hi Gavin, On 07/16/2013 01:17 PM, Gavin Jones wrote: Hello, Apologies for my earlier reply, I did not see the request for Cluster size as well as block size. According to o2info, cluster size is 65536. Thanks, Gavin W. Jones Where 2 Get It, Inc. On Tue, Jul 16, 2013 at 9:58 AM, Gavin Jones gjo...@where2getit.com wrote: Hello, Block size: 4kB Kernel version: 3.4.6-2.10-default OCFS2: 1.5.0 Distribution is openSUSE 12.2. Thanks, Gavin W. Jones Where 2 Get It, Inc. On Mon, Jul 15, 2013 at 7:32 PM, Srinivas Eeda srinivas.e...@oracle.com wrote: I am not entirely sure about significant slowdown and cluster outage. But from your description and information you provided, you are seeing fragmentation related issues. What is the ocfs2/kernel version and what is the cluster size/block size of these volumes? On 07/15/2013 01:33 PM, Gavin Jones wrote: Hello, We have a 16 node OCFS2 cluster used for web serving duties. Each node mounts (the same) 6 OCFS2 volumes. Shared data includes client files, application files for our webapp, log files, configuration files. Storage provided by 2x EqualLogic PS400E iSCSI SANs, each having 12 drives in a RAID50; units are in a 'Group'. The problem we are having is that periodically, maybe once a week or so, we get several Apache processes on a handful of nodes that get stuck in D state and are unable to recover. This greatly increases server load, causes more Apache processes to backup, OCFS2 starts complaining about unresponsive nodes and before you know it, the cluster is down. This seems like a DLM issue. Could you provide the /proc/pid/stack of the process when the issue happens next? Does it change over time? If it is indeed stuck waiting on a DLM lock, the
Re: [Ocfs2-users] OCFS2 tuning, fragmentation and localalloc option. Cluster hanging during mix read+write workloads
Hi Gavin, On 07/16/2013 01:17 PM, Gavin Jones wrote: Hello, Apologies for my earlier reply, I did not see the request for Cluster size as well as block size. According to o2info, cluster size is 65536. Thanks, Gavin W. Jones Where 2 Get It, Inc. On Tue, Jul 16, 2013 at 9:58 AM, Gavin Jones gjo...@where2getit.com wrote: Hello, Block size: 4kB Kernel version: 3.4.6-2.10-default OCFS2: 1.5.0 Distribution is openSUSE 12.2. Thanks, Gavin W. Jones Where 2 Get It, Inc. On Mon, Jul 15, 2013 at 7:32 PM, Srinivas Eeda srinivas.e...@oracle.com wrote: I am not entirely sure about significant slowdown and cluster outage. But from your description and information you provided, you are seeing fragmentation related issues. What is the ocfs2/kernel version and what is the cluster size/block size of these volumes? On 07/15/2013 01:33 PM, Gavin Jones wrote: Hello, We have a 16 node OCFS2 cluster used for web serving duties. Each node mounts (the same) 6 OCFS2 volumes. Shared data includes client files, application files for our webapp, log files, configuration files. Storage provided by 2x EqualLogic PS400E iSCSI SANs, each having 12 drives in a RAID50; units are in a 'Group'. The problem we are having is that periodically, maybe once a week or so, we get several Apache processes on a handful of nodes that get stuck in D state and are unable to recover. This greatly increases server load, causes more Apache processes to backup, OCFS2 starts complaining about unresponsive nodes and before you know it, the cluster is down. This seems like a DLM issue. Could you provide the /proc/pid/stack of the process when the issue happens next? Does it change over time? If it is indeed stuck waiting on a DLM lock, the debug logs of DLM* might help (debug.ocfs2 -l). This seems to occur most often when we are doing writes + reads; if it is just reads the cluster hums along. However, when we need to update many files or remove lots of files (think temporary images) in addition to normal read activity, we have the above-mentioned problem. We have done some searching and found http://www.mail-archive.com/ocfs2-users@oss.oracle.com/msg05525.html which describes a similar problem with write activity. In that case, the problem was allocating contiguous space on a fragmented filesystem and the solution was to adjust the mount option 'localalloc'. We are wondering if we are in a similar position. Below is the output from the stat_sysdir_analyze.sh script mentioned in the link above, which analyzes stat_sysdir.sh output; I've included the two volumes that seem to be our 'problem' volumes. Volume 1: bash stat_sysdir_analyze.sh sde1-client-20130715.txt Number | of | clust. | Contiguous cluster size 4549 510 and smaller 1825 511 Volume 2: bash stat_sysdir_analyze.sh sdd1-data-20130715.txt Number | of | clust. | Contiguous cluster size 175 510 and smaller 23 511 Any evidence here of excessive fragmentation that tuning localalloc would help with? Also regarding localalloc, I notice it is different for the above two volumes on many of the nodes; I find this interesting as the cluster is supposed to make an educated guess on this value. For instance: /dev/sda1 on /u/client type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=6,coherency=full,user_xattr,noacl) /dev/sde1 on /u/data type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=5,coherency=full,user_xattr,noacl) /dev/sdd1 on /u/client type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=9,coherency=full,user_xattr,noacl) /dev/sdb1 on /u/data type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=5,coherency=full,user_xattr,noacl) /dev/sda1 on /u/client type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=11,coherency=full,user_xattr,noacl) /dev/sdc1 on /u/data type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=5,coherency=full,user_xattr,noacl) /dev/sda1 on /u/client type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=6,coherency=full,user_xattr,noacl) /dev/sdc1 on /u/data type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=7,coherency=full,user_xattr,noacl I'm not sure why the cluster would be picking different values depending on the node? Anyway, any opinions, advice, tuning suggestions greatly appreciated. This business of the cluster hanging is turning into quite a problem. I'll provide any other requested
Re: [Ocfs2-users] OCFS2 tuning, fragmentation and localalloc option. Cluster hanging during mix read+write workloads
Hello, Block size: 4kB Kernel version: 3.4.6-2.10-default OCFS2: 1.5.0 Distribution is openSUSE 12.2. Thanks, Gavin W. Jones Where 2 Get It, Inc. On Mon, Jul 15, 2013 at 7:32 PM, Srinivas Eeda srinivas.e...@oracle.com wrote: I am not entirely sure about significant slowdown and cluster outage. But from your description and information you provided, you are seeing fragmentation related issues. What is the ocfs2/kernel version and what is the cluster size/block size of these volumes? On 07/15/2013 01:33 PM, Gavin Jones wrote: Hello, We have a 16 node OCFS2 cluster used for web serving duties. Each node mounts (the same) 6 OCFS2 volumes. Shared data includes client files, application files for our webapp, log files, configuration files. Storage provided by 2x EqualLogic PS400E iSCSI SANs, each having 12 drives in a RAID50; units are in a 'Group'. The problem we are having is that periodically, maybe once a week or so, we get several Apache processes on a handful of nodes that get stuck in D state and are unable to recover. This greatly increases server load, causes more Apache processes to backup, OCFS2 starts complaining about unresponsive nodes and before you know it, the cluster is down. This seems to occur most often when we are doing writes + reads; if it is just reads the cluster hums along. However, when we need to update many files or remove lots of files (think temporary images) in addition to normal read activity, we have the above-mentioned problem. We have done some searching and found http://www.mail-archive.com/ocfs2-users@oss.oracle.com/msg05525.html which describes a similar problem with write activity. In that case, the problem was allocating contiguous space on a fragmented filesystem and the solution was to adjust the mount option 'localalloc'. We are wondering if we are in a similar position. Below is the output from the stat_sysdir_analyze.sh script mentioned in the link above, which analyzes stat_sysdir.sh output; I've included the two volumes that seem to be our 'problem' volumes. Volume 1: bash stat_sysdir_analyze.sh sde1-client-20130715.txt Number | of | clust. | Contiguous cluster size 4549 510 and smaller 1825 511 Volume 2: bash stat_sysdir_analyze.sh sdd1-data-20130715.txt Number | of | clust. | Contiguous cluster size 175 510 and smaller 23 511 Any evidence here of excessive fragmentation that tuning localalloc would help with? Also regarding localalloc, I notice it is different for the above two volumes on many of the nodes; I find this interesting as the cluster is supposed to make an educated guess on this value. For instance: /dev/sda1 on /u/client type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=6,coherency=full,user_xattr,noacl) /dev/sde1 on /u/data type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=5,coherency=full,user_xattr,noacl) /dev/sdd1 on /u/client type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=9,coherency=full,user_xattr,noacl) /dev/sdb1 on /u/data type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=5,coherency=full,user_xattr,noacl) /dev/sda1 on /u/client type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=11,coherency=full,user_xattr,noacl) /dev/sdc1 on /u/data type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=5,coherency=full,user_xattr,noacl) /dev/sda1 on /u/client type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=6,coherency=full,user_xattr,noacl) /dev/sdc1 on /u/data type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=7,coherency=full,user_xattr,noacl I'm not sure why the cluster would be picking different values depending on the node? Anyway, any opinions, advice, tuning suggestions greatly appreciated. This business of the cluster hanging is turning into quite a problem. I'll provide any other requested information upon request. Thanks, Gavin W. Jones Where 2 Get It, Inc. -- There has grown up in the minds of certain groups in this country the notion that because a man or corporation has made a profit out of the public for a number of years, the government and the courts are charged with the duty of guaranteeing such profit in the future, even in the face of changing circumstances and contrary to public interest. This strange doctrine is not supported by statute nor common law. ~Robert Heinlein ___ Ocfs2-users mailing list
Re: [Ocfs2-users] OCFS2 tuning, fragmentation and localalloc option. Cluster hanging during mix read+write workloads
Hello, Apologies for my earlier reply, I did not see the request for Cluster size as well as block size. According to o2info, cluster size is 65536. Thanks, Gavin W. Jones Where 2 Get It, Inc. On Tue, Jul 16, 2013 at 9:58 AM, Gavin Jones gjo...@where2getit.com wrote: Hello, Block size: 4kB Kernel version: 3.4.6-2.10-default OCFS2: 1.5.0 Distribution is openSUSE 12.2. Thanks, Gavin W. Jones Where 2 Get It, Inc. On Mon, Jul 15, 2013 at 7:32 PM, Srinivas Eeda srinivas.e...@oracle.com wrote: I am not entirely sure about significant slowdown and cluster outage. But from your description and information you provided, you are seeing fragmentation related issues. What is the ocfs2/kernel version and what is the cluster size/block size of these volumes? On 07/15/2013 01:33 PM, Gavin Jones wrote: Hello, We have a 16 node OCFS2 cluster used for web serving duties. Each node mounts (the same) 6 OCFS2 volumes. Shared data includes client files, application files for our webapp, log files, configuration files. Storage provided by 2x EqualLogic PS400E iSCSI SANs, each having 12 drives in a RAID50; units are in a 'Group'. The problem we are having is that periodically, maybe once a week or so, we get several Apache processes on a handful of nodes that get stuck in D state and are unable to recover. This greatly increases server load, causes more Apache processes to backup, OCFS2 starts complaining about unresponsive nodes and before you know it, the cluster is down. This seems to occur most often when we are doing writes + reads; if it is just reads the cluster hums along. However, when we need to update many files or remove lots of files (think temporary images) in addition to normal read activity, we have the above-mentioned problem. We have done some searching and found http://www.mail-archive.com/ocfs2-users@oss.oracle.com/msg05525.html which describes a similar problem with write activity. In that case, the problem was allocating contiguous space on a fragmented filesystem and the solution was to adjust the mount option 'localalloc'. We are wondering if we are in a similar position. Below is the output from the stat_sysdir_analyze.sh script mentioned in the link above, which analyzes stat_sysdir.sh output; I've included the two volumes that seem to be our 'problem' volumes. Volume 1: bash stat_sysdir_analyze.sh sde1-client-20130715.txt Number | of | clust. | Contiguous cluster size 4549 510 and smaller 1825 511 Volume 2: bash stat_sysdir_analyze.sh sdd1-data-20130715.txt Number | of | clust. | Contiguous cluster size 175 510 and smaller 23 511 Any evidence here of excessive fragmentation that tuning localalloc would help with? Also regarding localalloc, I notice it is different for the above two volumes on many of the nodes; I find this interesting as the cluster is supposed to make an educated guess on this value. For instance: /dev/sda1 on /u/client type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=6,coherency=full,user_xattr,noacl) /dev/sde1 on /u/data type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=5,coherency=full,user_xattr,noacl) /dev/sdd1 on /u/client type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=9,coherency=full,user_xattr,noacl) /dev/sdb1 on /u/data type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=5,coherency=full,user_xattr,noacl) /dev/sda1 on /u/client type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=11,coherency=full,user_xattr,noacl) /dev/sdc1 on /u/data type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=5,coherency=full,user_xattr,noacl) /dev/sda1 on /u/client type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=6,coherency=full,user_xattr,noacl) /dev/sdc1 on /u/data type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=7,coherency=full,user_xattr,noacl I'm not sure why the cluster would be picking different values depending on the node? Anyway, any opinions, advice, tuning suggestions greatly appreciated. This business of the cluster hanging is turning into quite a problem. I'll provide any other requested information upon request. Thanks, Gavin W. Jones Where 2 Get It, Inc. -- There has grown up in the minds of certain groups in this country the notion that because a man or corporation has made a profit out of the public for a number of years, the government and the courts are charged with the
[Ocfs2-users] OCFS2 tuning, fragmentation and localalloc option. Cluster hanging during mix read+write workloads
Hello, We have a 16 node OCFS2 cluster used for web serving duties. Each node mounts (the same) 6 OCFS2 volumes. Shared data includes client files, application files for our webapp, log files, configuration files. Storage provided by 2x EqualLogic PS400E iSCSI SANs, each having 12 drives in a RAID50; units are in a 'Group'. The problem we are having is that periodically, maybe once a week or so, we get several Apache processes on a handful of nodes that get stuck in D state and are unable to recover. This greatly increases server load, causes more Apache processes to backup, OCFS2 starts complaining about unresponsive nodes and before you know it, the cluster is down. This seems to occur most often when we are doing writes + reads; if it is just reads the cluster hums along. However, when we need to update many files or remove lots of files (think temporary images) in addition to normal read activity, we have the above-mentioned problem. We have done some searching and found http://www.mail-archive.com/ocfs2-users@oss.oracle.com/msg05525.html which describes a similar problem with write activity. In that case, the problem was allocating contiguous space on a fragmented filesystem and the solution was to adjust the mount option 'localalloc'. We are wondering if we are in a similar position. Below is the output from the stat_sysdir_analyze.sh script mentioned in the link above, which analyzes stat_sysdir.sh output; I've included the two volumes that seem to be our 'problem' volumes. Volume 1: bash stat_sysdir_analyze.sh sde1-client-20130715.txt Number | of | clust. | Contiguous cluster size 4549 510 and smaller 1825 511 Volume 2: bash stat_sysdir_analyze.sh sdd1-data-20130715.txt Number | of | clust. | Contiguous cluster size 175 510 and smaller 23 511 Any evidence here of excessive fragmentation that tuning localalloc would help with? Also regarding localalloc, I notice it is different for the above two volumes on many of the nodes; I find this interesting as the cluster is supposed to make an educated guess on this value. For instance: /dev/sda1 on /u/client type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=6,coherency=full,user_xattr,noacl) /dev/sde1 on /u/data type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=5,coherency=full,user_xattr,noacl) /dev/sdd1 on /u/client type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=9,coherency=full,user_xattr,noacl) /dev/sdb1 on /u/data type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=5,coherency=full,user_xattr,noacl) /dev/sda1 on /u/client type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=11,coherency=full,user_xattr,noacl) /dev/sdc1 on /u/data type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=5,coherency=full,user_xattr,noacl) /dev/sda1 on /u/client type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=6,coherency=full,user_xattr,noacl) /dev/sdc1 on /u/data type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=7,coherency=full,user_xattr,noacl I'm not sure why the cluster would be picking different values depending on the node? Anyway, any opinions, advice, tuning suggestions greatly appreciated. This business of the cluster hanging is turning into quite a problem. I'll provide any other requested information upon request. Thanks, Gavin W. Jones Where 2 Get It, Inc. -- There has grown up in the minds of certain groups in this country the notion that because a man or corporation has made a profit out of the public for a number of years, the government and the courts are charged with the duty of guaranteeing such profit in the future, even in the face of changing circumstances and contrary to public interest. This strange doctrine is not supported by statute nor common law. ~Robert Heinlein ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] OCFS2 tuning, fragmentation and localalloc option. Cluster hanging during mix read+write workloads
I am not entirely sure about significant slowdown and cluster outage. But from your description and information you provided, you are seeing fragmentation related issues. What is the ocfs2/kernel version and what is the cluster size/block size of these volumes? On 07/15/2013 01:33 PM, Gavin Jones wrote: Hello, We have a 16 node OCFS2 cluster used for web serving duties. Each node mounts (the same) 6 OCFS2 volumes. Shared data includes client files, application files for our webapp, log files, configuration files. Storage provided by 2x EqualLogic PS400E iSCSI SANs, each having 12 drives in a RAID50; units are in a 'Group'. The problem we are having is that periodically, maybe once a week or so, we get several Apache processes on a handful of nodes that get stuck in D state and are unable to recover. This greatly increases server load, causes more Apache processes to backup, OCFS2 starts complaining about unresponsive nodes and before you know it, the cluster is down. This seems to occur most often when we are doing writes + reads; if it is just reads the cluster hums along. However, when we need to update many files or remove lots of files (think temporary images) in addition to normal read activity, we have the above-mentioned problem. We have done some searching and found http://www.mail-archive.com/ocfs2-users@oss.oracle.com/msg05525.html which describes a similar problem with write activity. In that case, the problem was allocating contiguous space on a fragmented filesystem and the solution was to adjust the mount option 'localalloc'. We are wondering if we are in a similar position. Below is the output from the stat_sysdir_analyze.sh script mentioned in the link above, which analyzes stat_sysdir.sh output; I've included the two volumes that seem to be our 'problem' volumes. Volume 1: bash stat_sysdir_analyze.sh sde1-client-20130715.txt Number | of | clust. | Contiguous cluster size 4549 510 and smaller 1825 511 Volume 2: bash stat_sysdir_analyze.sh sdd1-data-20130715.txt Number | of | clust. | Contiguous cluster size 175 510 and smaller 23 511 Any evidence here of excessive fragmentation that tuning localalloc would help with? Also regarding localalloc, I notice it is different for the above two volumes on many of the nodes; I find this interesting as the cluster is supposed to make an educated guess on this value. For instance: /dev/sda1 on /u/client type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=6,coherency=full,user_xattr,noacl) /dev/sde1 on /u/data type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=5,coherency=full,user_xattr,noacl) /dev/sdd1 on /u/client type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=9,coherency=full,user_xattr,noacl) /dev/sdb1 on /u/data type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=5,coherency=full,user_xattr,noacl) /dev/sda1 on /u/client type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=11,coherency=full,user_xattr,noacl) /dev/sdc1 on /u/data type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=5,coherency=full,user_xattr,noacl) /dev/sda1 on /u/client type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=6,coherency=full,user_xattr,noacl) /dev/sdc1 on /u/data type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=7,coherency=full,user_xattr,noacl I'm not sure why the cluster would be picking different values depending on the node? Anyway, any opinions, advice, tuning suggestions greatly appreciated. This business of the cluster hanging is turning into quite a problem. I'll provide any other requested information upon request. Thanks, Gavin W. Jones Where 2 Get It, Inc. -- There has grown up in the minds of certain groups in this country the notion that because a man or corporation has made a profit out of the public for a number of years, the government and the courts are charged with the duty of guaranteeing such profit in the future, even in the face of changing circumstances and contrary to public interest. This strange doctrine is not supported by statute nor common law. ~Robert Heinlein ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users