Re: [Ocfs2-users] OCFS2 tuning, fragmentation and localalloc option. Cluster hanging during mix read+write workloads

2013-08-06 Thread Gavin Jones
Hi Goldwyn,

Apologies for the delayed reply.

The hung Apache process / OCFS issue cropped up again, so I thought
I'd pass along the contents of /proc/pid/stack of a few affected
processes:

gjones@slipapp02:~ sudo cat /proc/27521/stack
gjones's password:
[811663b4] poll_schedule_timeout+0x44/0x60
[81166d56] do_select+0x5a6/0x670
[81166fbe] core_sys_select+0x19e/0x2d0
[811671a5] sys_select+0xb5/0x110
[815429bd] system_call_fastpath+0x1a/0x1f
[7f394bdd5f23] 0x7f394bdd5f23
[] 0x
gjones@slipapp02:~ sudo cat /proc/27530/stack
[81249721] sys_semtimedop+0x5a1/0x8b0
[815429bd] system_call_fastpath+0x1a/0x1f
[7f394bdddb77] 0x7f394bdddb77
[] 0x
gjones@slipapp02:~ sudo cat /proc/27462/stack
[81249721] sys_semtimedop+0x5a1/0x8b0
[815429bd] system_call_fastpath+0x1a/0x1f
[7f394bdddb77] 0x7f394bdddb77
[] 0x
gjones@slipapp02:~ sudo cat /proc/27526/stack
[81249721] sys_semtimedop+0x5a1/0x8b0
[815429bd] system_call_fastpath+0x1a/0x1f
[7f394bdddb77] 0x7f394bdddb77
[] 0x


Additionally, in dmesg I see, for example,

[774981.361149] (/usr/sbin/httpd,8266,3):ocfs2_unlink:951 ERROR: status = -2
[775896.135467]
(/usr/sbin/httpd,8435,3):ocfs2_check_dir_for_entry:2119 ERROR: status
= -17
[775896.135474] (/usr/sbin/httpd,8435,3):ocfs2_mknod:459 ERROR: status = -17
[775896.135477] (/usr/sbin/httpd,8435,3):ocfs2_create:629 ERROR: status = -17
[788406.624126] connection1:0: ping timeout of 5 secs expired, recv
timeout 5, last rx 4491991450, last ping 4491992701, now 4491993952
[788406.624138] connection1:0: detected conn error (1011)
[788406.640132] connection2:0: ping timeout of 5 secs expired, recv
timeout 5, last rx 4491991451, last ping 4491992702, now 4491993956
[788406.640142] connection2:0: detected conn error (1011)
[788406.928134] connection4:0: ping timeout of 5 secs expired, recv
timeout 5, last rx 4491991524, last ping 4491992775, now 4491994028
[788406.928150] connection4:0: detected conn error (1011)
[788406.944147] connection5:0: ping timeout of 5 secs expired, recv
timeout 5, last rx 4491991528, last ping 4491992779, now 4491994032
[788406.944165] connection5:0: detected conn error (1011)
[788408.640123] connection3:0: ping timeout of 5 secs expired, recv
timeout 5, last rx 4491991954, last ping 4491993205, now 4491994456
[788408.640134] connection3:0: detected conn error (1011)
[788409.907968] connection1:0: detected conn error (1020)
[788409.908280] connection2:0: detected conn error (1020)
[788409.912683] connection4:0: detected conn error (1020)
[788409.913152] connection5:0: detected conn error (1020)
[788411.491818] connection3:0: detected conn error (1020)


that repeats for a bit and then I see

[1952161.012214] INFO: task /usr/sbin/httpd:27491 blocked for more
than 480 seconds.
[1952161.012219] echo 0  /proc/sys/kernel/hung_task_timeout_secs
disables this message.
[1952161.012221] /usr/sbin/httpd D 88081fc52b40 0 27491 27449 0x
[1952161.012226] 88031a85dc50 0082 880532a92640
88031a85dfd8
[1952161.012231] 88031a85dfd8 88031a85dfd8 8807f791c300
880532a92640
[1952161.012235] 8115f3ae 8802804bdd98 880532a92640

[1952161.012239] Call Trace:
[1952161.012251] [81538fea] __mutex_lock_slowpath+0xca/0x140
[1952161.012257] [81538b0a] mutex_lock+0x1a/0x40
[1952161.012262] [81160e80] do_lookup+0x290/0x340
[1952161.012269] [81161c7f] path_lookupat+0x10f/0x700
[1952161.012274] [8116229c] do_path_lookup+0x2c/0xc0
[1952161.012279] [8116372d] user_path_at_empty+0x5d/0xb0
[1952161.012283] [81158d9d] vfs_fstatat+0x2d/0x70
[1952161.012288] [81158fe2] sys_newstat+0x12/0x30
[1952161.012293] [815429bd] system_call_fastpath+0x1a/0x1f
[1952161.012308] [7f394bdcfb05] 0x7f394bdcfb04
[1952161.012382] INFO: task /usr/sbin/httpd:27560 blocked for more
than 480 seconds.
[1952161.012384] echo 0  /proc/sys/kernel/hung_task_timeout_secs
disables this message.
[1952161.012385] /usr/sbin/httpd D 88081fd52b40 0 27560 27449 0x
[1952161.012389] 880224023c50 0086 88024326e580
880224023fd8
[1952161.012393] 880224023fd8 880224023fd8 8807f79cc800
88024326e580
[1952161.012397] 8115f3ae 8802804bdd98 88024326e580

[1952161.012401] Call Trace:
[1952161.012406] [81538fea] __mutex_lock_slowpath+0xca/0x140
[1952161.012410] [81538b0a] mutex_lock+0x1a/0x40
[1952161.012415] [81160e80] do_lookup+0x290/0x340
[1952161.012420] [81161c7f] path_lookupat+0x10f/0x700
[1952161.012425] [8116229c] do_path_lookup+0x2c/0xc0
[1952161.012430] [8116372d] user_path_at_empty+0x5d/0xb0
[1952161.012434] [81158d9d] vfs_fstatat+0x2d/0x70
[1952161.012438] 

Re: [Ocfs2-users] OCFS2 tuning, fragmentation and localalloc option. Cluster hanging during mix read+write workloads

2013-08-06 Thread Gavin Jones
Hello Goldwyn,

Thanks for taking a look at this.  So, then, it does seem to be DLM
related.  We were running fine for a few weeks and then it came up
again this morning and has been going on throughout the day.

Regarding the DLM debugging, I allowed debugging for DLM_GLUE,
DLM_THREAD, DLM_MASTER and DLM_RECOVERY.  However, I don't see any DLM
logging output in dmesg or syslog --is there perhaps another way to
get at the actual DLM log?  I've searched around a bit but didn't find
anything that made it clear.

As for OCFS2 and iSCSI communications, they use the same physical
network interface but different VLANs on that interface.  The
connectionX:0 errors, then, seem to indicate an issue with the ISCSI
connection.  The system logs and monitoring software don't show any
warnings or errors about the interface going down, so the only thing I
can think of is the connection load balancing on the SAN, though
that's merely a hunch.  Maybe I should mail the list and see if anyone
has a similar setup.

If you could please point me in the right direction to make use of the
DLM debugging via debugs.ocfs2, I would appreciate it.

Thanks again,

Gavin W. Jones
Where 2 Get It, Inc.

On Tue, Aug 6, 2013 at 4:16 PM, Goldwyn Rodrigues rgold...@suse.de wrote:
 Hi Gavin,


 On 08/06/2013 01:59 PM, Gavin Jones wrote:

 Hi Goldwyn,

 Apologies for the delayed reply.

 The hung Apache process / OCFS issue cropped up again, so I thought
 I'd pass along the contents of /proc/pid/stack of a few affected
 processes:

 gjones@slipapp02:~ sudo cat /proc/27521/stack
 gjones's password:
 [811663b4] poll_schedule_timeout+0x44/0x60
 [81166d56] do_select+0x5a6/0x670
 [81166fbe] core_sys_select+0x19e/0x2d0
 [811671a5] sys_select+0xb5/0x110
 [815429bd] system_call_fastpath+0x1a/0x1f
 [7f394bdd5f23] 0x7f394bdd5f23
 [] 0x
 gjones@slipapp02:~ sudo cat /proc/27530/stack
 [81249721] sys_semtimedop+0x5a1/0x8b0
 [815429bd] system_call_fastpath+0x1a/0x1f
 [7f394bdddb77] 0x7f394bdddb77
 [] 0x
 gjones@slipapp02:~ sudo cat /proc/27462/stack
 [81249721] sys_semtimedop+0x5a1/0x8b0
 [815429bd] system_call_fastpath+0x1a/0x1f
 [7f394bdddb77] 0x7f394bdddb77
 [] 0x
 gjones@slipapp02:~ sudo cat /proc/27526/stack
 [81249721] sys_semtimedop+0x5a1/0x8b0
 [815429bd] system_call_fastpath+0x1a/0x1f
 [7f394bdddb77] 0x7f394bdddb77
 [] 0x


 Additionally, in dmesg I see, for example,

 [774981.361149] (/usr/sbin/httpd,8266,3):ocfs2_unlink:951 ERROR: status =
 -2
 [775896.135467]
 (/usr/sbin/httpd,8435,3):ocfs2_check_dir_for_entry:2119 ERROR: status
 = -17
 [775896.135474] (/usr/sbin/httpd,8435,3):ocfs2_mknod:459 ERROR: status =
 -17
 [775896.135477] (/usr/sbin/httpd,8435,3):ocfs2_create:629 ERROR: status =
 -17
 [788406.624126] connection1:0: ping timeout of 5 secs expired, recv
 timeout 5, last rx 4491991450, last ping 4491992701, now 4491993952
 [788406.624138] connection1:0: detected conn error (1011)
 [788406.640132] connection2:0: ping timeout of 5 secs expired, recv
 timeout 5, last rx 4491991451, last ping 4491992702, now 4491993956
 [788406.640142] connection2:0: detected conn error (1011)
 [788406.928134] connection4:0: ping timeout of 5 secs expired, recv
 timeout 5, last rx 4491991524, last ping 4491992775, now 4491994028
 [788406.928150] connection4:0: detected conn error (1011)
 [788406.944147] connection5:0: ping timeout of 5 secs expired, recv
 timeout 5, last rx 4491991528, last ping 4491992779, now 4491994032
 [788406.944165] connection5:0: detected conn error (1011)
 [788408.640123] connection3:0: ping timeout of 5 secs expired, recv
 timeout 5, last rx 4491991954, last ping 4491993205, now 4491994456
 [788408.640134] connection3:0: detected conn error (1011)
 [788409.907968] connection1:0: detected conn error (1020)
 [788409.908280] connection2:0: detected conn error (1020)
 [788409.912683] connection4:0: detected conn error (1020)
 [788409.913152] connection5:0: detected conn error (1020)
 [788411.491818] connection3:0: detected conn error (1020)


 that repeats for a bit and then I see

 [1952161.012214] INFO: task /usr/sbin/httpd:27491 blocked for more
 than 480 seconds.
 [1952161.012219] echo 0  /proc/sys/kernel/hung_task_timeout_secs
 disables this message.
 [1952161.012221] /usr/sbin/httpd D 88081fc52b40 0 27491 27449
 0x
 [1952161.012226] 88031a85dc50 0082 880532a92640
 88031a85dfd8
 [1952161.012231] 88031a85dfd8 88031a85dfd8 8807f791c300
 880532a92640
 [1952161.012235] 8115f3ae 8802804bdd98 880532a92640
 
 [1952161.012239] Call Trace:
 [1952161.012251] [81538fea] __mutex_lock_slowpath+0xca/0x140
 [1952161.012257] [81538b0a] mutex_lock+0x1a/0x40
 [1952161.012262] [81160e80] do_lookup+0x290/0x340
 

Re: [Ocfs2-users] OCFS2 tuning, fragmentation and localalloc option. Cluster hanging during mix read+write workloads

2013-08-06 Thread Sunil Mushran
If the storage connectivity is not stable, then dlm issues are to be
expected.
In this case, the processes are all trying to take the readlock. One
possible
scenario is that the node holding the writelock is not able to relinquish
the lock
because it cannot flush the updated inodes to disk. I would suggest you look
into load balancing and how it affects the iscsi connectivity from the
hosts.


On Tue, Aug 6, 2013 at 2:51 PM, Gavin Jones gjo...@where2getit.com wrote:

 Hello Goldwyn,

 Thanks for taking a look at this.  So, then, it does seem to be DLM
 related.  We were running fine for a few weeks and then it came up
 again this morning and has been going on throughout the day.

 Regarding the DLM debugging, I allowed debugging for DLM_GLUE,
 DLM_THREAD, DLM_MASTER and DLM_RECOVERY.  However, I don't see any DLM
 logging output in dmesg or syslog --is there perhaps another way to
 get at the actual DLM log?  I've searched around a bit but didn't find
 anything that made it clear.

 As for OCFS2 and iSCSI communications, they use the same physical
 network interface but different VLANs on that interface.  The
 connectionX:0 errors, then, seem to indicate an issue with the ISCSI
 connection.  The system logs and monitoring software don't show any
 warnings or errors about the interface going down, so the only thing I
 can think of is the connection load balancing on the SAN, though
 that's merely a hunch.  Maybe I should mail the list and see if anyone
 has a similar setup.

 If you could please point me in the right direction to make use of the
 DLM debugging via debugs.ocfs2, I would appreciate it.

 Thanks again,

 Gavin W. Jones
 Where 2 Get It, Inc.

 On Tue, Aug 6, 2013 at 4:16 PM, Goldwyn Rodrigues rgold...@suse.de
 wrote:
  Hi Gavin,
 
 
  On 08/06/2013 01:59 PM, Gavin Jones wrote:
 
  Hi Goldwyn,
 
  Apologies for the delayed reply.
 
  The hung Apache process / OCFS issue cropped up again, so I thought
  I'd pass along the contents of /proc/pid/stack of a few affected
  processes:
 
  gjones@slipapp02:~ sudo cat /proc/27521/stack
  gjones's password:
  [811663b4] poll_schedule_timeout+0x44/0x60
  [81166d56] do_select+0x5a6/0x670
  [81166fbe] core_sys_select+0x19e/0x2d0
  [811671a5] sys_select+0xb5/0x110
  [815429bd] system_call_fastpath+0x1a/0x1f
  [7f394bdd5f23] 0x7f394bdd5f23
  [] 0x
  gjones@slipapp02:~ sudo cat /proc/27530/stack
  [81249721] sys_semtimedop+0x5a1/0x8b0
  [815429bd] system_call_fastpath+0x1a/0x1f
  [7f394bdddb77] 0x7f394bdddb77
  [] 0x
  gjones@slipapp02:~ sudo cat /proc/27462/stack
  [81249721] sys_semtimedop+0x5a1/0x8b0
  [815429bd] system_call_fastpath+0x1a/0x1f
  [7f394bdddb77] 0x7f394bdddb77
  [] 0x
  gjones@slipapp02:~ sudo cat /proc/27526/stack
  [81249721] sys_semtimedop+0x5a1/0x8b0
  [815429bd] system_call_fastpath+0x1a/0x1f
  [7f394bdddb77] 0x7f394bdddb77
  [] 0x
 
 
  Additionally, in dmesg I see, for example,
 
  [774981.361149] (/usr/sbin/httpd,8266,3):ocfs2_unlink:951 ERROR: status
 =
  -2
  [775896.135467]
  (/usr/sbin/httpd,8435,3):ocfs2_check_dir_for_entry:2119 ERROR: status
  = -17
  [775896.135474] (/usr/sbin/httpd,8435,3):ocfs2_mknod:459 ERROR: status =
  -17
  [775896.135477] (/usr/sbin/httpd,8435,3):ocfs2_create:629 ERROR: status
 =
  -17
  [788406.624126] connection1:0: ping timeout of 5 secs expired, recv
  timeout 5, last rx 4491991450, last ping 4491992701, now 4491993952
  [788406.624138] connection1:0: detected conn error (1011)
  [788406.640132] connection2:0: ping timeout of 5 secs expired, recv
  timeout 5, last rx 4491991451, last ping 4491992702, now 4491993956
  [788406.640142] connection2:0: detected conn error (1011)
  [788406.928134] connection4:0: ping timeout of 5 secs expired, recv
  timeout 5, last rx 4491991524, last ping 4491992775, now 4491994028
  [788406.928150] connection4:0: detected conn error (1011)
  [788406.944147] connection5:0: ping timeout of 5 secs expired, recv
  timeout 5, last rx 4491991528, last ping 4491992779, now 4491994032
  [788406.944165] connection5:0: detected conn error (1011)
  [788408.640123] connection3:0: ping timeout of 5 secs expired, recv
  timeout 5, last rx 4491991954, last ping 4491993205, now 4491994456
  [788408.640134] connection3:0: detected conn error (1011)
  [788409.907968] connection1:0: detected conn error (1020)
  [788409.908280] connection2:0: detected conn error (1020)
  [788409.912683] connection4:0: detected conn error (1020)
  [788409.913152] connection5:0: detected conn error (1020)
  [788411.491818] connection3:0: detected conn error (1020)
 
 
  that repeats for a bit and then I see
 
  [1952161.012214] INFO: task /usr/sbin/httpd:27491 blocked for more
  than 480 seconds.
  [1952161.012219] echo 0  /proc/sys/kernel/hung_task_timeout_secs
  disables this 

Re: [Ocfs2-users] OCFS2 tuning, fragmentation and localalloc option. Cluster hanging during mix read+write workloads

2013-08-06 Thread Goldwyn Rodrigues
Hi Gavin,

On 08/06/2013 04:51 PM, Gavin Jones wrote:
 Hello Goldwyn,

 Thanks for taking a look at this.  So, then, it does seem to be DLM
 related.  We were running fine for a few weeks and then it came up
 again this morning and has been going on throughout the day.

 Regarding the DLM debugging, I allowed debugging for DLM_GLUE,
 DLM_THREAD, DLM_MASTER and DLM_RECOVERY.  However, I don't see any DLM
 logging output in dmesg or syslog --is there perhaps another way to
 get at the actual DLM log?  I've searched around a bit but didn't find
 anything that made it clear.

Unfortunately CONFIG_OCFS2_DEBUG_MASKLOG is not enabled for opensuse 
kernels but for SLES kernels only. Sorry about that :(

However, you can recompile the kernel with this enabled in the config file.


 As for OCFS2 and iSCSI communications, they use the same physical
 network interface but different VLANs on that interface.  The
 connectionX:0 errors, then, seem to indicate an issue with the ISCSI
 connection.  The system logs and monitoring software don't show any
 warnings or errors about the interface going down, so the only thing I
 can think of is the connection load balancing on the SAN, though
 that's merely a hunch.  Maybe I should mail the list and see if anyone
 has a similar setup.

You will not have anything in the logs if the network issues are 
intermittent. Perhaps a simple ping when the issue is occurring is the 
best tool.

My doubts on network issues keep getting stronger by the information you 
have given me so far. Also, as Sunil mentioned, you have a problem if 
the storage does not respond anyways.


 If you could please point me in the right direction to make use of the
 DLM debugging via debugs.ocfs2, I would appreciate it.


snipped

-- 
Goldwyn

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] OCFS2 tuning, fragmentation and localalloc option. Cluster hanging during mix read+write workloads

2013-07-18 Thread Gavin Jones
Hello,

Sure, I'd be happy to provide such information next time this occurs.

Can you elaborate, or point me at documentation / procedure regarding
the DLM debug logs and what would be helpful to see?  I have read
Troubleshooting OCFS2 [1] and the section Debugging File System
Locks --is this what you're referring to?

Not sure if this will provide additional context or just muddy the
waters, but I thought to provide some syslog messages from an affected
server the last time this occurred.

Jul 14 15:36:55 slipapp07 kernel: [2173588.704093] o2net: Connection
to node slipapp03 (num 2) at 172.16.40.122: has been idle for
30.97 secs, shutting it down.
Jul 14 15:36:55 slipapp07 kernel: [2173588.704146] o2net: No longer
connected to node slipapp03 (num 2) at 172.16.40.122:
Jul 14 15:36:55 slipapp07 kernel: [2173588.704279]
(kworker/u:1,12787,4):dlm_do_assert_master:1665 ERROR: Error -112 when
sending message 502 (key 0xdc8be796) to node 2
Jul 14 15:36:55 slipapp07 kernel: [2173588.704295]
(kworker/u:5,26056,5):dlm_do_master_request:1332 ERROR: link to 2 went
down!
Jul 14 15:36:55 slipapp07 kernel: [2173588.704301]
(kworker/u:5,26056,5):dlm_get_lock_resource:917 ERROR: status = -112
Jul 14 15:37:25 slipapp07 kernel: [2173618.784153] o2net: No
connection established with node 2 after 30.0 seconds, giving up.
snip
Jul 14 15:39:14 slipapp07 kernel: [2173727.920793]
(kworker/u:2,13894,1):dlm_do_assert_master:1665 ERROR: Error -112 when
sending message 502 (key 0xdc8be796) to node 4
Jul 14 15:39:14 slipapp07 kernel: [2173727.920833]
(/usr/sbin/httpd,5023,5):dlm_send_remote_lock_request:336 ERROR:
A08674A831ED4048B5136BD8613B21E0: res N0152a8da, Error -112
send CREATE LOCK to node 4
Jul 14 15:39:14 slipapp07 kernel: [2173727.930562]
(kworker/u:2,13894,1):dlm_do_assert_master:1665 ERROR: Error -107 when
sending message 502 (key 0xdc8be796) to node 4
Jul 14 15:39:14 slipapp07 kernel: [2173727.944998]
(kworker/u:2,13894,1):dlm_do_assert_master:1665 ERROR: Error -107 when
sending message 502 (key 0xdc8be796) to node 4
Jul 14 15:39:14 slipapp07 kernel: [2173727.951511]
(kworker/u:2,13894,1):dlm_do_assert_master:1665 ERROR: Error -107 when
sending message 502 (key 0xdc8be796) to node 4
Jul 14 15:39:14 slipapp07 kernel: [2173727.973848]
(kworker/u:2,13894,1):dlm_do_assert_master:1665 ERROR: Error -107 when
sending message 502 (key 0xdc8be796) to node 4
Jul 14 15:39:14 slipapp07 kernel: [2173727.990216]
(kworker/u:2,13894,7):dlm_do_assert_master:1665 ERROR: Error -107 when
sending message 502 (key 0xdc8be796) to node 4
Jul 14 15:39:14 slipapp07 kernel: [2173728.024139]
(/usr/sbin/httpd,5023,5):dlm_send_remote_lock_request:336 ERROR:
A08674A831ED4048B5136BD8613B21E0: res N0152a8da, Error -107
send CREATE LOCK to node 4
snip, many, many more like the above

Which I suppose would indicate DLM issues; I have previously tried to
investigate this (via abovementioned guide) but was unable to make
real headway.

I apologize for the rather basic questions...

Thanks,

Gavin W. Jones
Where 2 Get It, Inc.

[1]:  http://docs.oracle.com/cd/E37670_01/E37355/html/ol_tshoot_ocfs2.html

On Wed, Jul 17, 2013 at 7:07 AM, Goldwyn Rodrigues rgold...@suse.de wrote:
 Hi Gavin,


 On 07/16/2013 01:17 PM, Gavin Jones wrote:

   Hello,

 Apologies for my earlier reply, I did not see the request for Cluster
 size as well as block size.

 According to o2info, cluster size is 65536.

 Thanks,

 Gavin W. Jones
 Where 2 Get It, Inc.

 On Tue, Jul 16, 2013 at 9:58 AM, Gavin Jones gjo...@where2getit.com
 wrote:

 Hello,

 Block size: 4kB

 Kernel version:  3.4.6-2.10-default

 OCFS2:  1.5.0

 Distribution is openSUSE 12.2.

 Thanks,

 Gavin W. Jones
 Where 2 Get It, Inc.

 On Mon, Jul 15, 2013 at 7:32 PM, Srinivas Eeda srinivas.e...@oracle.com
 wrote:

 I am not entirely sure about significant slowdown and cluster outage.
 But from your description and information you provided, you are seeing
 fragmentation related issues. What is the ocfs2/kernel version and what
 is the cluster size/block size of these volumes?


 On 07/15/2013 01:33 PM, Gavin Jones wrote:

 Hello,

 We have a 16 node OCFS2 cluster used for web serving duties.  Each
 node mounts (the same) 6 OCFS2 volumes.  Shared data includes client
 files, application files for our webapp, log files, configuration
 files.  Storage provided by 2x EqualLogic PS400E iSCSI SANs, each
 having 12 drives in a RAID50; units are in a 'Group'.

 The problem we are having is that periodically, maybe once a week or
 so, we get several Apache processes on a handful of nodes that get
 stuck in D state and are unable to recover.  This greatly increases
 server load, causes more Apache processes to backup, OCFS2 starts
 complaining about unresponsive nodes and before you know it, the
 cluster is down.


 This seems like a DLM issue. Could you provide the /proc/pid/stack of the
 process when the issue happens next? Does it change over time?

 If it is indeed stuck waiting on a DLM lock, the 

Re: [Ocfs2-users] OCFS2 tuning, fragmentation and localalloc option. Cluster hanging during mix read+write workloads

2013-07-17 Thread Goldwyn Rodrigues
Hi Gavin,

On 07/16/2013 01:17 PM, Gavin Jones wrote:
   Hello,

 Apologies for my earlier reply, I did not see the request for Cluster
 size as well as block size.

 According to o2info, cluster size is 65536.

 Thanks,

 Gavin W. Jones
 Where 2 Get It, Inc.

 On Tue, Jul 16, 2013 at 9:58 AM, Gavin Jones gjo...@where2getit.com wrote:
 Hello,

 Block size: 4kB

 Kernel version:  3.4.6-2.10-default

 OCFS2:  1.5.0

 Distribution is openSUSE 12.2.

 Thanks,

 Gavin W. Jones
 Where 2 Get It, Inc.

 On Mon, Jul 15, 2013 at 7:32 PM, Srinivas Eeda srinivas.e...@oracle.com 
 wrote:
 I am not entirely sure about significant slowdown and cluster outage.
 But from your description and information you provided, you are seeing
 fragmentation related issues. What is the ocfs2/kernel version and what
 is the cluster size/block size of these volumes?


 On 07/15/2013 01:33 PM, Gavin Jones wrote:
 Hello,

 We have a 16 node OCFS2 cluster used for web serving duties.  Each
 node mounts (the same) 6 OCFS2 volumes.  Shared data includes client
 files, application files for our webapp, log files, configuration
 files.  Storage provided by 2x EqualLogic PS400E iSCSI SANs, each
 having 12 drives in a RAID50; units are in a 'Group'.

 The problem we are having is that periodically, maybe once a week or
 so, we get several Apache processes on a handful of nodes that get
 stuck in D state and are unable to recover.  This greatly increases
 server load, causes more Apache processes to backup, OCFS2 starts
 complaining about unresponsive nodes and before you know it, the
 cluster is down.

This seems like a DLM issue. Could you provide the /proc/pid/stack of 
the process when the issue happens next? Does it change over time?

If it is indeed stuck waiting on a DLM lock, the debug logs of DLM* 
might help (debug.ocfs2 -l).



 This seems to occur most often when we are doing writes + reads; if it
 is just reads the cluster hums along.  However, when we need to update
 many files or remove lots of files (think temporary images) in
 addition to normal read activity, we have the above-mentioned problem.

 We have done some searching and found
 http://www.mail-archive.com/ocfs2-users@oss.oracle.com/msg05525.html
 which describes a similar problem with write activity.  In that case,
 the problem was allocating contiguous space on a fragmented filesystem
 and the solution was to adjust the mount option 'localalloc'.  We are
 wondering if we are in a similar position.

 Below is the output from the stat_sysdir_analyze.sh script mentioned
 in the link above, which analyzes stat_sysdir.sh output; I've included
 the two volumes that seem to be our 'problem' volumes.

 Volume 1:
 bash stat_sysdir_analyze.sh sde1-client-20130715.txt
 Number |
 of |
 clust. | Contiguous cluster size
 
 4549 510 and smaller
 1825 511

 Volume 2:
 bash stat_sysdir_analyze.sh sdd1-data-20130715.txt
 Number |
 of |
 clust. | Contiguous cluster size
 
 175 510 and smaller
 23 511

 Any evidence here of excessive fragmentation that tuning localalloc
 would help with?

 Also regarding localalloc, I notice it is different for the above two
 volumes on many of the nodes; I find this interesting as the cluster
 is supposed to make an educated guess on this value.  For instance:

 /dev/sda1 on /u/client type ocfs2
 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=6,coherency=full,user_xattr,noacl)
 /dev/sde1 on /u/data type ocfs2
 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=5,coherency=full,user_xattr,noacl)


 /dev/sdd1 on /u/client type ocfs2
 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=9,coherency=full,user_xattr,noacl)
 /dev/sdb1 on /u/data type ocfs2
 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=5,coherency=full,user_xattr,noacl)


 /dev/sda1 on /u/client type ocfs2
 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=11,coherency=full,user_xattr,noacl)
 /dev/sdc1 on /u/data type ocfs2
 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=5,coherency=full,user_xattr,noacl)


 /dev/sda1 on /u/client type ocfs2
 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=6,coherency=full,user_xattr,noacl)
 /dev/sdc1 on /u/data type ocfs2
 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=7,coherency=full,user_xattr,noacl

 I'm not sure why the cluster would be picking different values
 depending on the node?

 Anyway, any opinions, advice, tuning suggestions greatly appreciated.
 This business of the cluster hanging is turning into quite a problem.

 I'll provide any other requested 

Re: [Ocfs2-users] OCFS2 tuning, fragmentation and localalloc option. Cluster hanging during mix read+write workloads

2013-07-16 Thread Gavin Jones
Hello,

Block size: 4kB

Kernel version:  3.4.6-2.10-default

OCFS2:  1.5.0

Distribution is openSUSE 12.2.

Thanks,

Gavin W. Jones
Where 2 Get It, Inc.

On Mon, Jul 15, 2013 at 7:32 PM, Srinivas Eeda srinivas.e...@oracle.com wrote:
 I am not entirely sure about significant slowdown and cluster outage.
 But from your description and information you provided, you are seeing
 fragmentation related issues. What is the ocfs2/kernel version and what
 is the cluster size/block size of these volumes?


 On 07/15/2013 01:33 PM, Gavin Jones wrote:
 Hello,

 We have a 16 node OCFS2 cluster used for web serving duties.  Each
 node mounts (the same) 6 OCFS2 volumes.  Shared data includes client
 files, application files for our webapp, log files, configuration
 files.  Storage provided by 2x EqualLogic PS400E iSCSI SANs, each
 having 12 drives in a RAID50; units are in a 'Group'.

 The problem we are having is that periodically, maybe once a week or
 so, we get several Apache processes on a handful of nodes that get
 stuck in D state and are unable to recover.  This greatly increases
 server load, causes more Apache processes to backup, OCFS2 starts
 complaining about unresponsive nodes and before you know it, the
 cluster is down.

 This seems to occur most often when we are doing writes + reads; if it
 is just reads the cluster hums along.  However, when we need to update
 many files or remove lots of files (think temporary images) in
 addition to normal read activity, we have the above-mentioned problem.

 We have done some searching and found
 http://www.mail-archive.com/ocfs2-users@oss.oracle.com/msg05525.html
 which describes a similar problem with write activity.  In that case,
 the problem was allocating contiguous space on a fragmented filesystem
 and the solution was to adjust the mount option 'localalloc'.  We are
 wondering if we are in a similar position.

 Below is the output from the stat_sysdir_analyze.sh script mentioned
 in the link above, which analyzes stat_sysdir.sh output; I've included
 the two volumes that seem to be our 'problem' volumes.

 Volume 1:
 bash stat_sysdir_analyze.sh sde1-client-20130715.txt
 Number |
 of |
 clust. | Contiguous cluster size
 
 4549 510 and smaller
 1825 511

 Volume 2:
 bash stat_sysdir_analyze.sh sdd1-data-20130715.txt
 Number |
 of |
 clust. | Contiguous cluster size
 
 175 510 and smaller
 23 511

 Any evidence here of excessive fragmentation that tuning localalloc
 would help with?

 Also regarding localalloc, I notice it is different for the above two
 volumes on many of the nodes; I find this interesting as the cluster
 is supposed to make an educated guess on this value.  For instance:

 /dev/sda1 on /u/client type ocfs2
 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=6,coherency=full,user_xattr,noacl)
 /dev/sde1 on /u/data type ocfs2
 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=5,coherency=full,user_xattr,noacl)


 /dev/sdd1 on /u/client type ocfs2
 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=9,coherency=full,user_xattr,noacl)
 /dev/sdb1 on /u/data type ocfs2
 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=5,coherency=full,user_xattr,noacl)


 /dev/sda1 on /u/client type ocfs2
 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=11,coherency=full,user_xattr,noacl)
 /dev/sdc1 on /u/data type ocfs2
 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=5,coherency=full,user_xattr,noacl)


 /dev/sda1 on /u/client type ocfs2
 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=6,coherency=full,user_xattr,noacl)
 /dev/sdc1 on /u/data type ocfs2
 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=7,coherency=full,user_xattr,noacl

 I'm not sure why the cluster would be picking different values
 depending on the node?

 Anyway, any opinions, advice, tuning suggestions greatly appreciated.
 This business of the cluster hanging is turning into quite a problem.

 I'll provide any other requested information upon request.

 Thanks,

 Gavin W. Jones
 Where 2 Get It, Inc.

 --
 There has grown up in the minds of certain groups in this country the
 notion that because a man or corporation has made a profit out of the
 public for a number of years, the government and the courts are
 charged with the duty of guaranteeing such profit in the future, even
 in the face of changing circumstances and contrary to public interest.
 This strange doctrine is not supported by statute nor common law.

 ~Robert Heinlein

 ___
 Ocfs2-users mailing list
 

Re: [Ocfs2-users] OCFS2 tuning, fragmentation and localalloc option. Cluster hanging during mix read+write workloads

2013-07-16 Thread Gavin Jones
 Hello,

Apologies for my earlier reply, I did not see the request for Cluster
size as well as block size.

According to o2info, cluster size is 65536.

Thanks,

Gavin W. Jones
Where 2 Get It, Inc.

On Tue, Jul 16, 2013 at 9:58 AM, Gavin Jones gjo...@where2getit.com wrote:
 Hello,

 Block size: 4kB

 Kernel version:  3.4.6-2.10-default

 OCFS2:  1.5.0

 Distribution is openSUSE 12.2.

 Thanks,

 Gavin W. Jones
 Where 2 Get It, Inc.

 On Mon, Jul 15, 2013 at 7:32 PM, Srinivas Eeda srinivas.e...@oracle.com 
 wrote:
 I am not entirely sure about significant slowdown and cluster outage.
 But from your description and information you provided, you are seeing
 fragmentation related issues. What is the ocfs2/kernel version and what
 is the cluster size/block size of these volumes?


 On 07/15/2013 01:33 PM, Gavin Jones wrote:
 Hello,

 We have a 16 node OCFS2 cluster used for web serving duties.  Each
 node mounts (the same) 6 OCFS2 volumes.  Shared data includes client
 files, application files for our webapp, log files, configuration
 files.  Storage provided by 2x EqualLogic PS400E iSCSI SANs, each
 having 12 drives in a RAID50; units are in a 'Group'.

 The problem we are having is that periodically, maybe once a week or
 so, we get several Apache processes on a handful of nodes that get
 stuck in D state and are unable to recover.  This greatly increases
 server load, causes more Apache processes to backup, OCFS2 starts
 complaining about unresponsive nodes and before you know it, the
 cluster is down.

 This seems to occur most often when we are doing writes + reads; if it
 is just reads the cluster hums along.  However, when we need to update
 many files or remove lots of files (think temporary images) in
 addition to normal read activity, we have the above-mentioned problem.

 We have done some searching and found
 http://www.mail-archive.com/ocfs2-users@oss.oracle.com/msg05525.html
 which describes a similar problem with write activity.  In that case,
 the problem was allocating contiguous space on a fragmented filesystem
 and the solution was to adjust the mount option 'localalloc'.  We are
 wondering if we are in a similar position.

 Below is the output from the stat_sysdir_analyze.sh script mentioned
 in the link above, which analyzes stat_sysdir.sh output; I've included
 the two volumes that seem to be our 'problem' volumes.

 Volume 1:
 bash stat_sysdir_analyze.sh sde1-client-20130715.txt
 Number |
 of |
 clust. | Contiguous cluster size
 
 4549 510 and smaller
 1825 511

 Volume 2:
 bash stat_sysdir_analyze.sh sdd1-data-20130715.txt
 Number |
 of |
 clust. | Contiguous cluster size
 
 175 510 and smaller
 23 511

 Any evidence here of excessive fragmentation that tuning localalloc
 would help with?

 Also regarding localalloc, I notice it is different for the above two
 volumes on many of the nodes; I find this interesting as the cluster
 is supposed to make an educated guess on this value.  For instance:

 /dev/sda1 on /u/client type ocfs2
 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=6,coherency=full,user_xattr,noacl)
 /dev/sde1 on /u/data type ocfs2
 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=5,coherency=full,user_xattr,noacl)


 /dev/sdd1 on /u/client type ocfs2
 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=9,coherency=full,user_xattr,noacl)
 /dev/sdb1 on /u/data type ocfs2
 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=5,coherency=full,user_xattr,noacl)


 /dev/sda1 on /u/client type ocfs2
 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=11,coherency=full,user_xattr,noacl)
 /dev/sdc1 on /u/data type ocfs2
 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=5,coherency=full,user_xattr,noacl)


 /dev/sda1 on /u/client type ocfs2
 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=6,coherency=full,user_xattr,noacl)
 /dev/sdc1 on /u/data type ocfs2
 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=7,coherency=full,user_xattr,noacl

 I'm not sure why the cluster would be picking different values
 depending on the node?

 Anyway, any opinions, advice, tuning suggestions greatly appreciated.
 This business of the cluster hanging is turning into quite a problem.

 I'll provide any other requested information upon request.

 Thanks,

 Gavin W. Jones
 Where 2 Get It, Inc.

 --
 There has grown up in the minds of certain groups in this country the
 notion that because a man or corporation has made a profit out of the
 public for a number of years, the government and the courts are
 charged with the 

[Ocfs2-users] OCFS2 tuning, fragmentation and localalloc option. Cluster hanging during mix read+write workloads

2013-07-15 Thread Gavin Jones
Hello,

We have a 16 node OCFS2 cluster used for web serving duties.  Each
node mounts (the same) 6 OCFS2 volumes.  Shared data includes client
files, application files for our webapp, log files, configuration
files.  Storage provided by 2x EqualLogic PS400E iSCSI SANs, each
having 12 drives in a RAID50; units are in a 'Group'.

The problem we are having is that periodically, maybe once a week or
so, we get several Apache processes on a handful of nodes that get
stuck in D state and are unable to recover.  This greatly increases
server load, causes more Apache processes to backup, OCFS2 starts
complaining about unresponsive nodes and before you know it, the
cluster is down.

This seems to occur most often when we are doing writes + reads; if it
is just reads the cluster hums along.  However, when we need to update
many files or remove lots of files (think temporary images) in
addition to normal read activity, we have the above-mentioned problem.

We have done some searching and found
http://www.mail-archive.com/ocfs2-users@oss.oracle.com/msg05525.html
which describes a similar problem with write activity.  In that case,
the problem was allocating contiguous space on a fragmented filesystem
and the solution was to adjust the mount option 'localalloc'.  We are
wondering if we are in a similar position.

Below is the output from the stat_sysdir_analyze.sh script mentioned
in the link above, which analyzes stat_sysdir.sh output; I've included
the two volumes that seem to be our 'problem' volumes.

Volume 1:
bash stat_sysdir_analyze.sh sde1-client-20130715.txt
Number |
of |
clust. | Contiguous cluster size

4549 510 and smaller
1825 511

Volume 2:
bash stat_sysdir_analyze.sh sdd1-data-20130715.txt
Number |
of |
clust. | Contiguous cluster size

175 510 and smaller
23 511

Any evidence here of excessive fragmentation that tuning localalloc
would help with?

Also regarding localalloc, I notice it is different for the above two
volumes on many of the nodes; I find this interesting as the cluster
is supposed to make an educated guess on this value.  For instance:

/dev/sda1 on /u/client type ocfs2
(rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=6,coherency=full,user_xattr,noacl)
/dev/sde1 on /u/data type ocfs2
(rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=5,coherency=full,user_xattr,noacl)


/dev/sdd1 on /u/client type ocfs2
(rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=9,coherency=full,user_xattr,noacl)
/dev/sdb1 on /u/data type ocfs2
(rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=5,coherency=full,user_xattr,noacl)


/dev/sda1 on /u/client type ocfs2
(rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=11,coherency=full,user_xattr,noacl)
/dev/sdc1 on /u/data type ocfs2
(rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=5,coherency=full,user_xattr,noacl)


/dev/sda1 on /u/client type ocfs2
(rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=6,coherency=full,user_xattr,noacl)
/dev/sdc1 on /u/data type ocfs2
(rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=7,coherency=full,user_xattr,noacl

I'm not sure why the cluster would be picking different values
depending on the node?

Anyway, any opinions, advice, tuning suggestions greatly appreciated.
This business of the cluster hanging is turning into quite a problem.

I'll provide any other requested information upon request.

Thanks,

Gavin W. Jones
Where 2 Get It, Inc.

--
There has grown up in the minds of certain groups in this country the
notion that because a man or corporation has made a profit out of the
public for a number of years, the government and the courts are
charged with the duty of guaranteeing such profit in the future, even
in the face of changing circumstances and contrary to public interest.
This strange doctrine is not supported by statute nor common law.

~Robert Heinlein

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] OCFS2 tuning, fragmentation and localalloc option. Cluster hanging during mix read+write workloads

2013-07-15 Thread Srinivas Eeda
I am not entirely sure about significant slowdown and cluster outage. 
But from your description and information you provided, you are seeing 
fragmentation related issues. What is the ocfs2/kernel version and what 
is the cluster size/block size of these volumes?


On 07/15/2013 01:33 PM, Gavin Jones wrote:
 Hello,

 We have a 16 node OCFS2 cluster used for web serving duties.  Each
 node mounts (the same) 6 OCFS2 volumes.  Shared data includes client
 files, application files for our webapp, log files, configuration
 files.  Storage provided by 2x EqualLogic PS400E iSCSI SANs, each
 having 12 drives in a RAID50; units are in a 'Group'.

 The problem we are having is that periodically, maybe once a week or
 so, we get several Apache processes on a handful of nodes that get
 stuck in D state and are unable to recover.  This greatly increases
 server load, causes more Apache processes to backup, OCFS2 starts
 complaining about unresponsive nodes and before you know it, the
 cluster is down.

 This seems to occur most often when we are doing writes + reads; if it
 is just reads the cluster hums along.  However, when we need to update
 many files or remove lots of files (think temporary images) in
 addition to normal read activity, we have the above-mentioned problem.

 We have done some searching and found
 http://www.mail-archive.com/ocfs2-users@oss.oracle.com/msg05525.html
 which describes a similar problem with write activity.  In that case,
 the problem was allocating contiguous space on a fragmented filesystem
 and the solution was to adjust the mount option 'localalloc'.  We are
 wondering if we are in a similar position.

 Below is the output from the stat_sysdir_analyze.sh script mentioned
 in the link above, which analyzes stat_sysdir.sh output; I've included
 the two volumes that seem to be our 'problem' volumes.

 Volume 1:
 bash stat_sysdir_analyze.sh sde1-client-20130715.txt
 Number |
 of |
 clust. | Contiguous cluster size
 
 4549 510 and smaller
 1825 511

 Volume 2:
 bash stat_sysdir_analyze.sh sdd1-data-20130715.txt
 Number |
 of |
 clust. | Contiguous cluster size
 
 175 510 and smaller
 23 511

 Any evidence here of excessive fragmentation that tuning localalloc
 would help with?

 Also regarding localalloc, I notice it is different for the above two
 volumes on many of the nodes; I find this interesting as the cluster
 is supposed to make an educated guess on this value.  For instance:

 /dev/sda1 on /u/client type ocfs2
 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=6,coherency=full,user_xattr,noacl)
 /dev/sde1 on /u/data type ocfs2
 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=5,coherency=full,user_xattr,noacl)


 /dev/sdd1 on /u/client type ocfs2
 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=9,coherency=full,user_xattr,noacl)
 /dev/sdb1 on /u/data type ocfs2
 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=5,coherency=full,user_xattr,noacl)


 /dev/sda1 on /u/client type ocfs2
 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=11,coherency=full,user_xattr,noacl)
 /dev/sdc1 on /u/data type ocfs2
 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=5,coherency=full,user_xattr,noacl)


 /dev/sda1 on /u/client type ocfs2
 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=6,coherency=full,user_xattr,noacl)
 /dev/sdc1 on /u/data type ocfs2
 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,localalloc=7,coherency=full,user_xattr,noacl

 I'm not sure why the cluster would be picking different values
 depending on the node?

 Anyway, any opinions, advice, tuning suggestions greatly appreciated.
 This business of the cluster hanging is turning into quite a problem.

 I'll provide any other requested information upon request.

 Thanks,

 Gavin W. Jones
 Where 2 Get It, Inc.

 --
 There has grown up in the minds of certain groups in this country the
 notion that because a man or corporation has made a profit out of the
 public for a number of years, the government and the courts are
 charged with the duty of guaranteeing such profit in the future, even
 in the face of changing circumstances and contrary to public interest.
 This strange doctrine is not supported by statute nor common law.

 ~Robert Heinlein

 ___
 Ocfs2-users mailing list
 Ocfs2-users@oss.oracle.com
 https://oss.oracle.com/mailman/listinfo/ocfs2-users


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users