Re: [Ocfs2-users] Trouble getting node to re-join two node cluster (OCFS2/DRBD Primary/Primary)

2011-09-15 Thread Sunil Mushran

-F does not run the full fsck. -f does.

But I would not recommend running fsck as this corruption is not
normal. The inodes in the system directory have been overwritten.
That typically means a storage issue. The fs does not create/remove
inodes in sysdir. Only the tools do that.

You may want to shutdown drbd and access the devices directly on
the two machines. See if they are ok. If so, then select one as the
master and copy it to the other.

On 09/15/2011 04:20 PM, Mike Reid wrote:

I may have made some progress on my OCFS2 error:

See the following output from "dmesg"

[88740.345617] OCFS2: ERROR (device drbd0): ocfs2_validate_inode_block: Invalid 
dinode #11: fs_generation is 376662488
[88740.345664] File system is now read-only due to the potential of on-disk 
corruption. Please run fsck.ocfs2 once the file system is unmounted.
[88740.345710] (mount.ocfs2,26394,5):ocfs2_read_locked_inode:499 ERROR: status 
= -22
[88740.345743] (mount.ocfs2,26394,5):_ocfs2_get_system_file_inode:120 ERROR: 
status = -116
[88740.345807] (mount.ocfs2,26394,5):ocfs2_init_global_system_inodes:466 ERROR: 
status = -22
[88740.345890] (mount.ocfs2,26394,5):ocfs2_init_global_system_inodes:469 ERROR: 
Unable to load system inode 4, possibly corrupt fs?
[88740.345958] (mount.ocfs2,26394,5):ocfs2_initialize_super:2261 ERROR: status 
= -22
[88740.346067] (mount.ocfs2,26394,5):ocfs2_fill_super:1023 ERROR: status = -22
[88740.346124] ocfs2: Unmounting device (147,0) on (node 0)


I decided to run "fsck.ocfs2 -F /dev/drbd0":

Checking OCFS2 filesystem in /dev/drbd0:
  label: 
  uuid:   fe 42 73 e1 f8 66 45 41 bb cf 66 c5 df d4 96 d6
  number of blocks:   2436
  bytes per block:4096
  number of clusters: 2436
  bytes per cluster:  4096
  max slots:  8

/dev/drbd0 wasn't cleanly unmounted by all nodes.  Attempting to replay the 
journals for nodes that didn't unmount cleanly
Checking each slot's journal.
Replaying slot 0's journal.
Slot 0's journal replayed successfully.
Slot 0's local alloc replayed successfully
/dev/drbd0 is clean.  It will be checked after 20 additional mounts.
Slot 0's journal dirty flag removed


Unfortunately, I still cannot mount the fs

> mount -t ocfs2 /dev/drbd/by-res/repdata /data

(see attached strace) 


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] Trouble getting node to re-join two node cluster (OCFS2/DRBD Primary/Primary)

2011-09-15 Thread Mike Reid
I may have made some progress on my OCFS2 error:

See the following output from ³dmesg²

[88740.345617] OCFS2: ERROR (device drbd0): ocfs2_validate_inode_block:
Invalid dinode #11: fs_generation is 376662488
[88740.345664] File system is now read-only due to the potential of on-disk
corruption. Please run fsck.ocfs2 once the file system is unmounted.
[88740.345710] (mount.ocfs2,26394,5):ocfs2_read_locked_inode:499 ERROR:
status = -22
[88740.345743] (mount.ocfs2,26394,5):_ocfs2_get_system_file_inode:120 ERROR:
status = -116
[88740.345807] (mount.ocfs2,26394,5):ocfs2_init_global_system_inodes:466
ERROR: status = -22
[88740.345890] (mount.ocfs2,26394,5):ocfs2_init_global_system_inodes:469
ERROR: Unable to load system inode 4, possibly corrupt fs?
[88740.345958] (mount.ocfs2,26394,5):ocfs2_initialize_super:2261 ERROR:
status = -22
[88740.346067] (mount.ocfs2,26394,5):ocfs2_fill_super:1023 ERROR: status =
-22
[88740.346124] ocfs2: Unmounting device (147,0) on (node 0)


I decided to run ³fsck.ocfs2 -F /dev/drbd0²:

Checking OCFS2 filesystem in /dev/drbd0:
  label:  
  uuid:   fe 42 73 e1 f8 66 45 41 bb cf 66 c5 df d4 96 d6
  number of blocks:   2436
  bytes per block:4096
  number of clusters: 2436
  bytes per cluster:  4096
  max slots:  8

/dev/drbd0 wasn't cleanly unmounted by all nodes.  Attempting to replay the
journals for nodes that didn't unmount cleanly
Checking each slot's journal.
Replaying slot 0's journal.
Slot 0's journal replayed successfully.
Slot 0's local alloc replayed successfully
/dev/drbd0 is clean.  It will be checked after 20 additional mounts.
Slot 0's journal dirty flag removed


Unfortunately, I still cannot mount the fs

> mount -t ocfs2 /dev/drbd/by-res/repdata /data

(see attached strace)


mount_trace_3.txt
Description: Binary data
___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] Trouble getting node to re-join two node cluster (OCFS2/DRBD Primary/Primary)

2011-09-15 Thread Mike Reid
Interesting observation. Thank you, Sunil.

I should note that I could not figure out how to perform a stack trace from
within Pacemaker directly, so I waiting for Pacemaker to start
O2CB/OCFS2/DLM and tried manually to mount to get the trace.

I¹ve noticed that as soon as it fails (via Pacemaker) the DRBD Primary
device gets demoted to Secondary...I wonder if perhaps attempt was possibly
too late and the /dev/drbd0 perhaps already in Secondary state? ...it seems
likely in order to satisfy the first condition: if( mdev->state.role !=
R_PRIMARY ) { ...

I wonder what else I could try (manually or via pacemaker) to help determine
what may be at fault here?

Normally I can set a node to standby, and then back online with no
issues...but somehow this node will no longer join, even after rebooting
both nodes in the cluster, etc..


From: Sunil Mushran 
Date: Thu, 15 Sep 2011 13:42:54 -0700
To: Mike Reid 
Cc: 
Subject: Re: [Ocfs2-users] Trouble getting node to re-join two node cluster
(OCFS2/DRBD Primary/Primary)

   open("/dev/drbd0", O_RDONLY|O_DIRECT) = -1 EMEDIUMTYPE (Wrong medium
type)
 
 drbd_open()
 ...
 if (mdev->state.role != R_PRIMARY) {
 if (mode & FMODE_WRITE)
 rv = -EROFS;
 else if (!allow_oos)
 rv = -EMEDIUMTYPE;
 }
 ...
 
 So the failure appears to be emanating from drbd. There seems
 to be a allow_oos module param that is not 0. I have no idea
 what this param does. Also, am reading current mainline. 2.6.35 may
 be different.
 
 On 09/15/2011 01:26 PM, Mike Reid wrote:
>  
> Hello all,
> 
> ** I have also posted this in the pacemaker list, but I have a feeling it's
> more OCFS2 specific **
> 
> We have a two-node cluster still in development that has been running fine
> for weeks (little to no traffic). I made some updates to our CIB recently,
> and everything seemed just fine.
> 
> Yesterday I attempted to untar ~1.5GB to the OCFS2/DRBD volume, and once it
> was complete one of the nodes had become completely disconnected and I
> haven't been able to reconnect since.
> 
> DRBD is working fine, everything is UpToDate and I can get both nodes in
> Primary/Primary, but when it comes down to starting OCFS2 and mounting the
> volume, I'm left with:
> 
>  
>>  
>> resFS:0_start_0 (node=node1, call=21, rc=1, status=complete): unknown error
>>  
>  
> 
> I am using "pcmk" as the cluster_stack, and letting Pacemaker control
> everything...
> 
> The last time this happened the only way I was able to resolve it was to
> reformat the device (via mkfs.ocfs2 -F). I don't think I should have to do
> this, underlying blocks seem fine, and one of the nodes is running just
> fine. The (currently) unmounted node is staying in sync as far as DRBD is
> concerned.
> 
> Here's some detail that hopefully will help, please let me know if there's
> anything else I can provide to help know the best way to get this node back
> "online":
> 
> 
> Ubuntu 10.10 / Kernel 2.6.35
> 
> Pacemaker 1.0.9.1
> Corosync 1.2.1
> Cluster Agents 1.0.3 (Heartbeat)
> Cluster Glue 1.0.6
> OpenAIS 1.1.2
> 
> DRBD 8.3.10
> OCFS2 1.5.0
> 
> cat /sys/fs/ocfs2/cluster_stack = pcmk
> 
> node1: mounted.ocfs2 -d
> 
> DeviceFS UUID  Label
> /dev/sda3 ocfs2  fe4273e1-f866-4541-bbcf-66c5dfd496d6
> 
> node2: mounted.ocfs2 -d
> 
> DeviceFS UUID  Label
> /dev/sda3 ocfs2  d6f7cc6d-21d1-46d3-9792-bc650736a5ef
> /dev/drbd0ocfs2  d6f7cc6d-21d1-46d3-9792-bc650736a5ef
> 
> * NOTES:
> - Both nodes are identical, in fact one node is a direct mirror (hdd clone)
> - I have attached the CIB (crm configure edit contents) and mount trace
> 
>  
> 
> 
> ___
> Ocfs2-users mailing list
> Ocfs2-users@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>  
 
 

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] Trouble getting node to re-join two node cluster (OCFS2/DRBD Primary/Primary)

2011-09-15 Thread Sunil Mushran

open("/dev/drbd0", O_RDONLY|O_DIRECT) = -1 EMEDIUMTYPE (Wrong medium type)

drbd_open()
...
if (mdev->state.role != R_PRIMARY) {
if (mode & FMODE_WRITE)
rv = -EROFS;
else if (!allow_oos)
rv = -EMEDIUMTYPE;
}
...

So the failure appears to be emanating from drbd. There seems
to be a allow_oos module param that is not 0. I have no idea
what this param does. Also, am reading current mainline. 2.6.35 may
be different.

On 09/15/2011 01:26 PM, Mike Reid wrote:

Hello all,

** I have also posted this in the pacemaker list, but I have a feeling it's
more OCFS2 specific **

We have a two-node cluster still in development that has been running fine
for weeks (little to no traffic). I made some updates to our CIB recently,
and everything seemed just fine.

Yesterday I attempted to untar ~1.5GB to the OCFS2/DRBD volume, and once it
was complete one of the nodes had become completely disconnected and I
haven't been able to reconnect since.

DRBD is working fine, everything is UpToDate and I can get both nodes in
Primary/Primary, but when it comes down to starting OCFS2 and mounting the
volume, I'm left with:


resFS:0_start_0 (node=node1, call=21, rc=1, status=complete): unknown error

I am using "pcmk" as the cluster_stack, and letting Pacemaker control
everything...

The last time this happened the only way I was able to resolve it was to
reformat the device (via mkfs.ocfs2 -F). I don't think I should have to do
this, underlying blocks seem fine, and one of the nodes is running just
fine. The (currently) unmounted node is staying in sync as far as DRBD is
concerned.

Here's some detail that hopefully will help, please let me know if there's
anything else I can provide to help know the best way to get this node back
"online":


Ubuntu 10.10 / Kernel 2.6.35

Pacemaker 1.0.9.1
Corosync 1.2.1
Cluster Agents 1.0.3 (Heartbeat)
Cluster Glue 1.0.6
OpenAIS 1.1.2

DRBD 8.3.10
OCFS2 1.5.0

cat /sys/fs/ocfs2/cluster_stack = pcmk

node1: mounted.ocfs2 -d

DeviceFS UUID  Label
/dev/sda3 ocfs2  fe4273e1-f866-4541-bbcf-66c5dfd496d6

node2: mounted.ocfs2 -d

DeviceFS UUID  Label
/dev/sda3 ocfs2  d6f7cc6d-21d1-46d3-9792-bc650736a5ef
/dev/drbd0ocfs2  d6f7cc6d-21d1-46d3-9792-bc650736a5ef

* NOTES:
- Both nodes are identical, in fact one node is a direct mirror (hdd clone)
- I have attached the CIB (crm configure edit contents) and mount trace



___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

[Ocfs2-users] Trouble getting node to re-join two node cluster (OCFS2/DRBD Primary/Primary)

2011-09-15 Thread Mike Reid
Hello all,

** I have also posted this in the pacemaker list, but I have a feeling it's
more OCFS2 specific **

We have a two-node cluster still in development that has been running fine
for weeks (little to no traffic). I made some updates to our CIB recently,
and everything seemed just fine.

Yesterday I attempted to untar ~1.5GB to the OCFS2/DRBD volume, and once it
was complete one of the nodes had become completely disconnected and I
haven't been able to reconnect since.

DRBD is working fine, everything is UpToDate and I can get both nodes in
Primary/Primary, but when it comes down to starting OCFS2 and mounting the
volume, I'm left with:

> resFS:0_start_0 (node=node1, call=21, rc=1, status=complete): unknown error

I am using "pcmk" as the cluster_stack, and letting Pacemaker control
everything...

The last time this happened the only way I was able to resolve it was to
reformat the device (via mkfs.ocfs2 -F). I don't think I should have to do
this, underlying blocks seem fine, and one of the nodes is running just
fine. The (currently) unmounted node is staying in sync as far as DRBD is
concerned.

Here's some detail that hopefully will help, please let me know if there's
anything else I can provide to help know the best way to get this node back
"online":


Ubuntu 10.10 / Kernel 2.6.35

Pacemaker 1.0.9.1
Corosync 1.2.1
Cluster Agents 1.0.3 (Heartbeat)
Cluster Glue 1.0.6
OpenAIS 1.1.2

DRBD 8.3.10
OCFS2 1.5.0

cat /sys/fs/ocfs2/cluster_stack = pcmk

node1: mounted.ocfs2 -d

DeviceFS UUID  Label
/dev/sda3 ocfs2  fe4273e1-f866-4541-bbcf-66c5dfd496d6

node2: mounted.ocfs2 -d

DeviceFS UUID  Label
/dev/sda3 ocfs2  d6f7cc6d-21d1-46d3-9792-bc650736a5ef
/dev/drbd0ocfs2  d6f7cc6d-21d1-46d3-9792-bc650736a5ef

* NOTES:
- Both nodes are identical, in fact one node is a direct mirror (hdd clone)
- I have attached the CIB (crm configure edit contents) and mount trace



crm_configure.txt
Description: Binary data


mount_trace.txt
Description: Binary data
___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] The mounting of too many OCFS2 volumes (i.e. 50 or more) per cluster

2011-09-15 Thread Sunil Mushran
That's very old. We have users having 50+ mounts. The one disadvantage
is that the o2cb stack heartbeats on all mounts. That problem will be addressed
in 1.8 (the tools will be released soon), with global heartbeat (hb volumes
are user-configurable).

Having said that, the number of volumes depends on the hardware capability.
It is hard to provide simple rules for this. The best solution is to test and 
figure
out the perf bottleneck.

On 09/15/2011 01:00 AM, Marko Sutic wrote:
> Hi list,
>
> I have a question concerning number of OCFS2 volumes per cluster.
>
> From our storage vendor we received recommendations how to configure mount 
> volumes per database to gain the best possible performance.
> Basically, we should separate redo logs,archive logs, temp, data, etc.
>
> This is not hard to configure but I'm concerned about this line that I've 
> found on Oracle support site:
>
> Linux OCFS2 - Best Practices [ID 603080.1]
> Number of volumes
> The mounting of too many OCFS2 volumes (i.e. 50 or more) per cluster is 
> likely to create a performance (process) bottleneck - this is not 
> specifically related to OCFS2. Ideally, it is desirable to have no more than 
> around 20 OCFS2 partitions per system.
> See also http://oss.oracle.com/bugzilla/show_bug.cgi?id=992
>
>
>
> In our configuration we would need more than 60 OCFS2 mount volumes per 
> cluster so I don't know should we expect any performance problems due to the 
> number of OCFS2 volumes?
> What is your recommendation about number of OCFS2 volumes per cluster 
> regarding performance and stability?
>
>
> Our kernel and ocfs2 version:
> # uname -rvp
> 2.6.18-274.0.0.0.1.el5 #1 SMP Mon Jul 25 14:33:14 EDT 2011 x86_64
>
> # rpm -qa|grep ocfs2
> ocfs2-tools-1.6.3-2.el5
> ocfs2-2.6.18-274.0.0.0.1.el5-1.4.8-2.el5
> ocfs2console-1.6.3-2.el5
>
>
>
> Thank you very much for your help.
>
> Regards,
> Marko Sutic


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] Linux kernel crash due to ocfs2

2011-09-15 Thread Sunil Mushran
I was hoping to get a readable stack. Please could you provide a link to
the coredump.

On 09/15/2011 02:51 AM, Betzos Giorgos wrote:
> Hello,
>
> I am sorry for the delay in responding. Unfortunately, if faulted again.
>
> Here is the log. Although my email client folds the Memory Map lines.
> The core file is available.
>
> Thanks,
>
> George
>
> # ./o2image.ppc.dbg /dev/mapper/mpath0 /files_shared/u02.o2image
> *** glibc detected *** ./o2image.ppc.dbg: corrupted double-linked list:
> 0x10075000 ***
> === Backtrace: =
> /lib/libc.so.6[0xfeb1ab4]
> /lib/libc.so.6(cfree+0xc8)[0xfeb5b68]
> ./o2image.ppc.dbg[0x1000d098]
> ./o2image.ppc.dbg[0x1000297c]
> ./o2image.ppc.dbg[0x10001eb8]
> ./o2image.ppc.dbg[0x1000228c]
> ./o2image.ppc.dbg[0x10002804]
> ./o2image.ppc.dbg[0x10001eb8]
> ./o2image.ppc.dbg[0x1000228c]
> ./o2image.ppc.dbg[0x10002804]
> ./o2image.ppc.dbg[0x10003bbc]
> ./o2image.ppc.dbg[0x10004480]
> /lib/libc.so.6[0xfe4dc60]
> /lib/libc.so.6[0xfe4dea0]
> === Memory map: 
> 0010-0012 r-xp 0010 00:00 0
> [vdso]
> 0f43-0f44 r-xp  08:13
> 180307 /lib/libcom_err.so.2.1
> 0f44-0f45 rw-p  08:13
> 180307 /lib/libcom_err.so.2.1
> 0f90-0f9c r-xp  08:13
> 180293 /lib/libglib-2.0.so.0.1200.3
> 0f9c-0f9d rw-p 000b 08:13
> 180293 /lib/libglib-2.0.so.0.1200.3
> 0fa4-0fa5 r-xp  08:13
> 180292 /lib/librt-2.5.so
> 0fa5-0fa6 r--p  08:13
> 180292 /lib/librt-2.5.so
> 0fa6-0fa7 rw-p 0001 08:13
> 180292 /lib/librt-2.5.so
> 0fce-0fd0 r-xp  08:13
> 180291 /lib/libpthread-2.5.so
> 0fd0-0fd1 r--p 0001 08:13
> 180291 /lib/libpthread-2.5.so
> 0fd1-0fd2 rw-p 0002 08:13
> 180291 /lib/libpthread-2.5.so
> 0fe3-0ffa r-xp  08:13
> 180288 /lib/libc-2.5.so
> 0ffa-0ffb r--p 0016 08:13
> 180288 /lib/libc-2.5.so
> 0ffb-0ffc rw-p 0017 08:13
> 180288 /lib/libc-2.5.so
> 0ffc-0ffe r-xp  08:13
> 180287 /lib/ld-2.5.so
> 0ffe-0fff r--p 0001 08:13
> 180287 /lib/ld-2.5.so
> 0fff-1000 rw-p 0002 08:13
> 180287 /lib/ld-2.5.so
> 1000-1005 r-xp  08:13
> 7487795/root/o2image.ppc.dbg
> 1005-1006 rw-p 0004 08:13
> 7487795/root/o2image.ppc.dbg
> 1006-1009 rwxp 1006 00:00 0
> [heap]
> f768-f7ff rw-p f768 00:00 0
> ff9a-ffaf rw-p ff9a 00:00 0
> [stack]
> Aborted (core dumped)
>
>
> On Thu, 2011-09-08 at 12:10 -0700, Sunil Mushran wrote:
>> http://oss.oracle.com/~smushran/o2image.ppc.dbg
>>
>> Use the above executable. Hoping it won't fault. But if it does
>> email me the backtrace. That trace will be readable as the exec
>> has debugging symbols enabled.
>>
>> On 09/07/2011 11:24 PM, Betzos Giorgos wrote:
>>> # rpm -q ocfs2-tools
>>> ocfs2-tools-1.4.4-1.el5.ppc
>>>
>>> On Wed, 2011-09-07 at 09:13 -0700, Sunil Mushran wrote:
 version of ocfs2-tools?

 On 09/07/2011 09:10 AM, Betzos Giorgos wrote:
> Hello,
>
> I tried what you suggested but here is what I got:
>
> # o2image /dev/mapper/mpath0 /files_shared/u02.o2image
> *** glibc detected *** o2image: corrupted double-linked list: 0x10045000 
> ***
> === Backtrace: =
> /lib/libc.so.6[0xfeb1ab4]
> /lib/libc.so.6(cfree+0xc8)[0xfeb5b68]
> o2image[0x10007bb0]
> o2image[0x10002748]
> o2image[0x10001f50]
> o2image[0x10002334]
> o2image[0x100026a0]
> o2image[0x10001f50]
> o2image[0x10002334]
> o2image[0x100026a0]
> o2image[0x1000358c]
> o2image[0x10003e28]
> /lib/libc.so.6[0xfe4dc60]
> /lib/libc.so.6[0xfe4dea0]
> === Memory map: 
> 0010-0012 r-xp 0010 00:00 0  
> [vdso]
> 0f55-0f56 r-xp  08:13 2881590
> /lib/libcom_err.so.2.1
> 0f56-0f57 rw-p  08:13 2881590
> /lib/libcom_err.so.2.1
> 0f90-0f9c r-xp  08:13 2881576
> /lib/libglib-2.0.so.0.1200.3
> 0f9c-0f9d rw-p 000b 08:13 2881576
> /lib/libglib-2.0.so.0.1200.3
> 0fa4-0fa5 r-xp  08:13 2881575
> /lib/librt-2.5.so
> 0fa5-0fa6 r--p  08:13 2881575
> /lib/librt-2.5.so
> 0fa6-0fa7

Re: [Ocfs2-users] Syslog reports (ocfs2_wq, 15527, 2):ocfs2_orphan_del:1841 ERROR: status = -2

2011-09-15 Thread Sunil Mushran
66040   -rw-rw   0   503   505   0
12-Sep-2008 22:40 000101f8

I think that's a safe bet considering there's only one orphaned inode.
And it's dated 2008. Notice the size is 0. Meaning the data segments have
all been released. Only the inode remains. fsck should clean this up.

On 09/15/2011 09:03 AM, Daniel Keisling wrote:

> Thanks for the quick reply.  Am I correct in assuming that the bad inode
> is "/dev/mapper/limsd_archp1?"
>
>
> [root@ausracdbd01 tmp]# cat debug_cmds.txt
> ls -l //orphan_dir:
> ls -l //orphan_dir:0001
> ls -l //orphan_dir:0002
> ls -l //orphan_dir:0003
>
> [root@ausracdbd01 tmp]# cat orphans.sh
> #!/bin/sh
>
> while read DEVICE
> do
>  echo Working on $DEVICE
>  debugfs.ocfs2 -f ./debug_cmds.txt $DEVICE
> done<  ocfs2_vols.txt
>
>
>
> [root@ausracdbd01 tmp]# sh orphans.sh
> Working on /dev/mapper/ph1tp1
> debugfs.ocfs2 1.4.4
> debugfs: ls -l //orphan_dir:
>  12  drwxr-xr-x   2 0 04096
> 1-Sep-2008 09:40 .
>  6   drwxr-xr-x   6 0 04096
> 25-Jun-2008 11:51 ..
> debugfs: ls -l //orphan_dir:0001
>  13  drwxr-xr-x   2 0 04096
> 16-Jul-2008 08:18 .
>  6   drwxr-xr-x   6 0 04096
> 25-Jun-2008 11:51 ..
> debugfs: ls -l //orphan_dir:0002
>  14  drwxr-xr-x   2 0 04096
> 16-Jul-2008 12:48 .
>  6   drwxr-xr-x   6 0 04096
> 25-Jun-2008 11:51 ..
> debugfs: ls -l //orphan_dir:0003
>  15  drwxr-xr-x   2 0 04096
> 12-Nov-2008 16:04 .
>  6   drwxr-xr-x   6 0 04096
> 25-Jun-2008 11:51 ..
>
> Working on /dev/mapper/ph1t_archp1
> debugfs.ocfs2 1.4.4
> debugfs: ls -l //orphan_dir:
>  12  drwxr-xr-x   2 0 04096
> 14-Sep-2011 11:30 .
>  6   drwxr-xr-x   6 0 04096
> 25-Jun-2008 11:51 ..
> debugfs: ls -l //orphan_dir:0001
>  13  drwxr-xr-x   2 0 08192
> 23-Aug-2011 10:00 .
>  6   drwxr-xr-x   6 0 04096
> 25-Jun-2008 11:51 ..
> debugfs: ls -l //orphan_dir:0002
>  14  drwxr-xr-x   2 0 04096
> 21-Oct-2008 11:30 .
>  6   drwxr-xr-x   6 0 04096
> 25-Jun-2008 11:51 ..
> debugfs: ls -l //orphan_dir:0003
>  15  drwxr-xr-x   2 0 08192
> 7-Nov-2008 08:06 .
>  6   drwxr-xr-x   6 0 04096
> 25-Jun-2008 11:51 ..
>
> Working on /dev/mapper/limstp1
> debugfs.ocfs2 1.4.4
> debugfs: ls -l //orphan_dir:
>  12  drwxr-xr-x   2 0 04096
> 18-Aug-2011 12:58 .
>  6   drwxr-xr-x   6 0 04096
> 25-Jun-2008 11:53 ..
> debugfs: ls -l //orphan_dir:0001
>  13  drwxr-xr-x   2 0 04096
> 25-Jun-2008 11:53 .
>  6   drwxr-xr-x   6 0 04096
> 25-Jun-2008 11:53 ..
> debugfs: ls -l //orphan_dir:0002
>  14  drwxr-xr-x   2 0 04096
> 25-Jun-2008 11:53 .
>  6   drwxr-xr-x   6 0 04096
> 25-Jun-2008 11:53 ..
> debugfs: ls -l //orphan_dir:0003
>  15  drwxr-xr-x   2 0 04096
> 25-Jun-2008 11:53 .
>  6   drwxr-xr-x   6 0 04096
> 25-Jun-2008 11:53 ..
>
> Working on /dev/mapper/limst_archp1
> debugfs.ocfs2 1.4.4
> debugfs: ls -l //orphan_dir:
>  12  drwxr-xr-x   2 0 04096
> 14-Sep-2011 11:30 .
>  6   drwxr-xr-x   6 0 04096
> 25-Jun-2008 11:54 ..
> debugfs: ls -l //orphan_dir:0001
>  13  drwxr-xr-x   2 0 08192
> 18-Aug-2011 12:53 .
>  6   drwxr-xr-x   6 0 04096
> 25-Jun-2008 11:54 ..
> debugfs: ls -l //orphan_dir:0002
>  14  drwxr-xr-x   2 0 04096
> 21-Oct-2008 11:30 .
>  6   drwxr-xr-x   6 0 04096
> 25-Jun-2008 11:54 ..
> debugfs: ls -l //orphan_dir:0003
>  15  drwxr-xr-x   2 0 04096
> 21-Oct-2008 11:30 .
>  6   drwxr-xr-x   6 0 04096
> 25-Jun-2008 11:54 ..
>
> Working on /dev/mapper/limsdp1
> debugfs.ocfs2 1.4.4
> debugfs: ls -l //orphan_dir:
>  12  drwxr-xr-x   2 0 04096
> 3-Jun-2008 16:59 .
>  6   drwxr-xr-x   6 0 04096
> 22-May-2008 11:58 ..
> debugfs: ls -l //orphan_dir:0001
>  13  drwxr-xr-x   2 0 04096
> 21-J

Re: [Ocfs2-users] Syslog reports (ocfs2_wq, 15527, 2):ocfs2_orphan_del:1841 ERROR: status = -2

2011-09-15 Thread Daniel Keisling
Thanks for the quick reply.  Am I correct in assuming that the bad inode
is "/dev/mapper/limsd_archp1?"


[root@ausracdbd01 tmp]# cat debug_cmds.txt 
ls -l //orphan_dir:
ls -l //orphan_dir:0001
ls -l //orphan_dir:0002
ls -l //orphan_dir:0003

[root@ausracdbd01 tmp]# cat orphans.sh 
#!/bin/sh

while read DEVICE
do
echo Working on $DEVICE
debugfs.ocfs2 -f ./debug_cmds.txt $DEVICE
done < ocfs2_vols.txt



[root@ausracdbd01 tmp]# sh orphans.sh 
Working on /dev/mapper/ph1tp1
debugfs.ocfs2 1.4.4
debugfs: ls -l //orphan_dir:
12  drwxr-xr-x   2 0 04096
1-Sep-2008 09:40 .
6   drwxr-xr-x   6 0 04096
25-Jun-2008 11:51 ..
debugfs: ls -l //orphan_dir:0001
13  drwxr-xr-x   2 0 04096
16-Jul-2008 08:18 .
6   drwxr-xr-x   6 0 04096
25-Jun-2008 11:51 ..
debugfs: ls -l //orphan_dir:0002
14  drwxr-xr-x   2 0 04096
16-Jul-2008 12:48 .
6   drwxr-xr-x   6 0 04096
25-Jun-2008 11:51 ..
debugfs: ls -l //orphan_dir:0003
15  drwxr-xr-x   2 0 04096
12-Nov-2008 16:04 .
6   drwxr-xr-x   6 0 04096
25-Jun-2008 11:51 ..

Working on /dev/mapper/ph1t_archp1
debugfs.ocfs2 1.4.4
debugfs: ls -l //orphan_dir:
12  drwxr-xr-x   2 0 04096
14-Sep-2011 11:30 .
6   drwxr-xr-x   6 0 04096
25-Jun-2008 11:51 ..
debugfs: ls -l //orphan_dir:0001
13  drwxr-xr-x   2 0 08192
23-Aug-2011 10:00 .
6   drwxr-xr-x   6 0 04096
25-Jun-2008 11:51 ..
debugfs: ls -l //orphan_dir:0002
14  drwxr-xr-x   2 0 04096
21-Oct-2008 11:30 .
6   drwxr-xr-x   6 0 04096
25-Jun-2008 11:51 ..
debugfs: ls -l //orphan_dir:0003
15  drwxr-xr-x   2 0 08192
7-Nov-2008 08:06 .
6   drwxr-xr-x   6 0 04096
25-Jun-2008 11:51 ..

Working on /dev/mapper/limstp1
debugfs.ocfs2 1.4.4
debugfs: ls -l //orphan_dir:
12  drwxr-xr-x   2 0 04096
18-Aug-2011 12:58 .
6   drwxr-xr-x   6 0 04096
25-Jun-2008 11:53 ..
debugfs: ls -l //orphan_dir:0001
13  drwxr-xr-x   2 0 04096
25-Jun-2008 11:53 .
6   drwxr-xr-x   6 0 04096
25-Jun-2008 11:53 ..
debugfs: ls -l //orphan_dir:0002
14  drwxr-xr-x   2 0 04096
25-Jun-2008 11:53 .
6   drwxr-xr-x   6 0 04096
25-Jun-2008 11:53 ..
debugfs: ls -l //orphan_dir:0003
15  drwxr-xr-x   2 0 04096
25-Jun-2008 11:53 .
6   drwxr-xr-x   6 0 04096
25-Jun-2008 11:53 ..

Working on /dev/mapper/limst_archp1
debugfs.ocfs2 1.4.4
debugfs: ls -l //orphan_dir:
12  drwxr-xr-x   2 0 04096
14-Sep-2011 11:30 .
6   drwxr-xr-x   6 0 04096
25-Jun-2008 11:54 ..
debugfs: ls -l //orphan_dir:0001
13  drwxr-xr-x   2 0 08192
18-Aug-2011 12:53 .
6   drwxr-xr-x   6 0 04096
25-Jun-2008 11:54 ..
debugfs: ls -l //orphan_dir:0002
14  drwxr-xr-x   2 0 04096
21-Oct-2008 11:30 .
6   drwxr-xr-x   6 0 04096
25-Jun-2008 11:54 ..
debugfs: ls -l //orphan_dir:0003
15  drwxr-xr-x   2 0 04096
21-Oct-2008 11:30 .
6   drwxr-xr-x   6 0 04096
25-Jun-2008 11:54 ..

Working on /dev/mapper/limsdp1
debugfs.ocfs2 1.4.4
debugfs: ls -l //orphan_dir:
12  drwxr-xr-x   2 0 04096
3-Jun-2008 16:59 .
6   drwxr-xr-x   6 0 04096
22-May-2008 11:58 ..
debugfs: ls -l //orphan_dir:0001
13  drwxr-xr-x   2 0 04096
21-Jun-2008 17:39 .
6   drwxr-xr-x   6 0 04096
22-May-2008 11:58 ..
debugfs: ls -l //orphan_dir:0002
14  drwxr-xr-x   2 0 04096
22-May-2008 11:58 .
6   drwxr-xr-x   6 0 04096
22-May-2008 11:58 ..
debugfs: ls -l //orphan_dir:0003
15  drwxr-xr-x   2 0 04096
22-May-2008 11:58 .
6   drwxr-xr-x   6 0 04096
22-May-2008 11:58 ..

Working on /dev/mapper/limsd_archp1
debugfs.ocfs2 1.4.4
debugfs: ls -l //orphan_dir:
12  

Re: [Ocfs2-users] Syslog reports (ocfs2_wq, 15527, 2):ocfs2_orphan_del:1841 ERROR: status = -2

2011-09-15 Thread Sunil Mushran

The issue that caused it has been fixed. The fix is here.
http://oss.oracle.com/git/?p=ocfs2-1.4.git;a=commit;h=b6f3de3fd54026df748bfd1449bbe31b9803f8f7

The actual problem could have happened much earlier.
1.4.4 is showing the messages as it is more aggressive (than 1.4.1)
in cleaning up the orphans. By default, the fs scans for orphans
once every 10 mins on a node in the cluster.

fsck should fix it. I would have to think you must not have fscked
that volume.

You can use debugfs.ocfs2 to look at the orphan dirs. List the
system dir using "ls-l //". Then list orphan dirs using "ls -l 
//orphan_dir:",

etc. Look at the timestamp on the inodes. The one that is oldest could
be the problematic one. That way you know which volume to fsck.
BTW, it is safe to run debugfs while the fs is mounted. At worst it will
provide you stale info.

On 09/15/2011 07:40 AM, Daniel Keisling wrote:

Hello,
I recently upgraded from OCFS2 v1.4.1 running on RHEL 5.1 with 
kernel-2.6.18-92.1.13.el5 to OCFS2 v1.4.4 running on RHEL 5.6 with 
kernel-2.6.18-194.32.1.el5.  I now see this is syslog every couple of 
minutes:
Sep 15 09:31:51 ausracdbd01 kernel: 
(ocfs2_wq,15527,2):ocfs2_orphan_del:1841 ERROR: status = -2
Sep 15 09:31:51 ausracdbd01 kernel: 
(ocfs2_wq,15527,2):ocfs2_remove_inode:628 ERROR: status = -2
Sep 15 09:31:51 ausracdbd01 kernel: 
(ocfs2_wq,15527,2):ocfs2_wipe_inode:754 ERROR: status = -2
Sep 15 09:31:51 ausracdbd01 kernel: 
(ocfs2_wq,15527,2):ocfs2_delete_inode:999 ERROR: status = -2
This is happening on all cluster nodes, on 3 separate OCFS2 clusters.  
I have performed a 'fsck.ocfs2 -f -y ' on _most_ of the 
filesystems, but not all.  fsck has always came back clean.

Does anyone know the source and fix for this error?
Thanks,
Daniel
*Daniel Keisling*
*Sr. Systems Administrator II*
*Information Technology*

PPD
7551 Metro Center Drive, Suite 300
Austin, TX 78744


*Phone* +1 512 747 5256
*Cell*  +1 512 653 1895
*Fax*   +1 512 685 7256
*e-mail* 	daniel.keisl...@ppdi.com 
mailto:daniel.keisl...@ppdi.com>

*Web site*  www.ppdi.com http://www.ppdi.com/>


This email transmission and any documents, files or previous email 
messages attached to it may contain information that is confidential 
or legally privileged.
If you are not the intended recipient or a person responsible for 
delivering this transmission to the intended recipient, you are hereby 
notified
that you must not read this transmission and that any disclosure, 
copying, printing, distribution or use of this transmission is 
strictly prohibited.
If you have received this transmission in error, please immediately 
notify the sender by telephone or return email and delete the original 
transmission and its attachments without reading or saving in any manner.



___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

[Ocfs2-users] Syslog reports (ocfs2_wq, 15527, 2):ocfs2_orphan_del:1841 ERROR: status = -2

2011-09-15 Thread Daniel Keisling
Hello,
 
I recently upgraded from OCFS2 v1.4.1 running on RHEL 5.1 with
kernel-2.6.18-92.1.13.el5 to OCFS2 v1.4.4 running on RHEL 5.6 with
kernel-2.6.18-194.32.1.el5.  I now see this is syslog every couple of
minutes:
 
Sep 15 09:31:51 ausracdbd01 kernel:
(ocfs2_wq,15527,2):ocfs2_orphan_del:1841 ERROR: status = -2
Sep 15 09:31:51 ausracdbd01 kernel:
(ocfs2_wq,15527,2):ocfs2_remove_inode:628 ERROR: status = -2
Sep 15 09:31:51 ausracdbd01 kernel:
(ocfs2_wq,15527,2):ocfs2_wipe_inode:754 ERROR: status = -2
Sep 15 09:31:51 ausracdbd01 kernel:
(ocfs2_wq,15527,2):ocfs2_delete_inode:999 ERROR: status = -2

 
This is happening on all cluster nodes, on 3 separate OCFS2 clusters.  I
have performed a 'fsck.ocfs2 -f -y ' on _most_ of the
filesystems, but not all.  fsck has always came back clean.
 
Does anyone know the source and fix for this error?
 
Thanks,
 
Daniel
 
 
 
 
Daniel Keisling
Sr. Systems Administrator II
Information Technology


PPD
7551 Metro Center Drive, Suite 300
Austin, TX 78744

Phone+1 512 747 5256
Cell +1 512 653 1895
Fax  +1 512 685 7256
e-mail   daniel.keisl...@ppdi.com
mailto:daniel.keisl...@ppdi.com>  
Web site www.ppdi.com http://www.ppdi.com/>   
 

This email transmission and any documents, files or previous email messages 
attached to it may contain information that is confidential or legally 
privileged. 
If you are not the intended recipient or a person responsible for delivering 
this transmission to the intended recipient, you are hereby notified 
that you must not read this transmission and that any disclosure, copying, 
printing, distribution or use of this transmission is strictly prohibited. 
If you have received this transmission in error, please immediately notify the 
sender by telephone or return email and delete the original transmission and 
its attachments without reading or saving in any manner.
___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] Linux kernel crash due to ocfs2

2011-09-15 Thread Betzos Giorgos
Hello,

I am sorry for the delay in responding. Unfortunately, if faulted again.

Here is the log. Although my email client folds the Memory Map lines.
The core file is available.

Thanks,

George

# ./o2image.ppc.dbg /dev/mapper/mpath0 /files_shared/u02.o2image
*** glibc detected *** ./o2image.ppc.dbg: corrupted double-linked list:
0x10075000 ***
=== Backtrace: =
/lib/libc.so.6[0xfeb1ab4]
/lib/libc.so.6(cfree+0xc8)[0xfeb5b68]
./o2image.ppc.dbg[0x1000d098]
./o2image.ppc.dbg[0x1000297c]
./o2image.ppc.dbg[0x10001eb8]
./o2image.ppc.dbg[0x1000228c]
./o2image.ppc.dbg[0x10002804]
./o2image.ppc.dbg[0x10001eb8]
./o2image.ppc.dbg[0x1000228c]
./o2image.ppc.dbg[0x10002804]
./o2image.ppc.dbg[0x10003bbc]
./o2image.ppc.dbg[0x10004480]
/lib/libc.so.6[0xfe4dc60]
/lib/libc.so.6[0xfe4dea0]
=== Memory map: 
0010-0012 r-xp 0010 00:00 0
[vdso]
0f43-0f44 r-xp  08:13
180307 /lib/libcom_err.so.2.1
0f44-0f45 rw-p  08:13
180307 /lib/libcom_err.so.2.1
0f90-0f9c r-xp  08:13
180293 /lib/libglib-2.0.so.0.1200.3
0f9c-0f9d rw-p 000b 08:13
180293 /lib/libglib-2.0.so.0.1200.3
0fa4-0fa5 r-xp  08:13
180292 /lib/librt-2.5.so
0fa5-0fa6 r--p  08:13
180292 /lib/librt-2.5.so
0fa6-0fa7 rw-p 0001 08:13
180292 /lib/librt-2.5.so
0fce-0fd0 r-xp  08:13
180291 /lib/libpthread-2.5.so
0fd0-0fd1 r--p 0001 08:13
180291 /lib/libpthread-2.5.so
0fd1-0fd2 rw-p 0002 08:13
180291 /lib/libpthread-2.5.so
0fe3-0ffa r-xp  08:13
180288 /lib/libc-2.5.so
0ffa-0ffb r--p 0016 08:13
180288 /lib/libc-2.5.so
0ffb-0ffc rw-p 0017 08:13
180288 /lib/libc-2.5.so
0ffc-0ffe r-xp  08:13
180287 /lib/ld-2.5.so
0ffe-0fff r--p 0001 08:13
180287 /lib/ld-2.5.so
0fff-1000 rw-p 0002 08:13
180287 /lib/ld-2.5.so
1000-1005 r-xp  08:13
7487795/root/o2image.ppc.dbg
1005-1006 rw-p 0004 08:13
7487795/root/o2image.ppc.dbg
1006-1009 rwxp 1006 00:00 0
[heap]
f768-f7ff rw-p f768 00:00 0
ff9a-ffaf rw-p ff9a 00:00 0
[stack]
Aborted (core dumped)
 

On Thu, 2011-09-08 at 12:10 -0700, Sunil Mushran wrote:
> http://oss.oracle.com/~smushran/o2image.ppc.dbg
> 
> Use the above executable. Hoping it won't fault. But if it does
> email me the backtrace. That trace will be readable as the exec
> has debugging symbols enabled.
> 
> On 09/07/2011 11:24 PM, Betzos Giorgos wrote:
> > # rpm -q ocfs2-tools
> > ocfs2-tools-1.4.4-1.el5.ppc
> >
> > On Wed, 2011-09-07 at 09:13 -0700, Sunil Mushran wrote:
> >> version of ocfs2-tools?
> >>
> >> On 09/07/2011 09:10 AM, Betzos Giorgos wrote:
> >>> Hello,
> >>>
> >>> I tried what you suggested but here is what I got:
> >>>
> >>> # o2image /dev/mapper/mpath0 /files_shared/u02.o2image
> >>> *** glibc detected *** o2image: corrupted double-linked list: 0x10045000 
> >>> ***
> >>> === Backtrace: =
> >>> /lib/libc.so.6[0xfeb1ab4]
> >>> /lib/libc.so.6(cfree+0xc8)[0xfeb5b68]
> >>> o2image[0x10007bb0]
> >>> o2image[0x10002748]
> >>> o2image[0x10001f50]
> >>> o2image[0x10002334]
> >>> o2image[0x100026a0]
> >>> o2image[0x10001f50]
> >>> o2image[0x10002334]
> >>> o2image[0x100026a0]
> >>> o2image[0x1000358c]
> >>> o2image[0x10003e28]
> >>> /lib/libc.so.6[0xfe4dc60]
> >>> /lib/libc.so.6[0xfe4dea0]
> >>> === Memory map: 
> >>> 0010-0012 r-xp 0010 00:00 0  
> >>> [vdso]
> >>> 0f55-0f56 r-xp  08:13 2881590
> >>> /lib/libcom_err.so.2.1
> >>> 0f56-0f57 rw-p  08:13 2881590
> >>> /lib/libcom_err.so.2.1
> >>> 0f90-0f9c r-xp  08:13 2881576
> >>> /lib/libglib-2.0.so.0.1200.3
> >>> 0f9c-0f9d rw-p 000b 08:13 2881576
> >>> /lib/libglib-2.0.so.0.1200.3
> >>> 0fa4-0fa5 r-xp  08:13 2881575
> >>> /lib/librt-2.5.so
> >>> 0fa5-0fa6 r--p  08:13 2881575
> >>> /lib/librt-2.5.so
> >>> 0fa6-0fa7 rw-p 0001 08:13 2881575
> >>> /lib/librt-2.5.so
> >>> 0fce-0fd0 r-xp  08:13 2881574
> >>> /lib/libpthread-2.5.so
> >>> 0fd0-0fd1 r--p 0001 08:13 2881574
> >>> /lib/

[Ocfs2-users] The mounting of too many OCFS2 volumes (i.e. 50 or more) per cluster

2011-09-15 Thread Marko Sutic
Hi list,

I have a question concerning number of OCFS2 volumes per cluster.

>From our storage vendor we received recommendations how to configure mount
volumes per database to gain the best possible performance.
Basically, we should separate redo logs,archive logs, temp, data, etc.

This is not hard to configure but I'm concerned about this line that I've
found on Oracle support site:

Linux OCFS2 - Best Practices [ID 603080.1]
Number of volumes
The mounting of too many OCFS2 volumes (i.e. 50 or more) per cluster is
likely to create a performance (process) bottleneck - this is not
specifically related to OCFS2. Ideally, it is desirable to have no more than
around 20 OCFS2 partitions per system.
See also http://oss.oracle.com/bugzilla/show_bug.cgi?id=992



In our configuration we would need more than 60 OCFS2 mount volumes per
cluster so I don't know should we expect any performance problems due to the
number of OCFS2 volumes?
What is your recommendation about number of OCFS2 volumes per cluster
regarding performance and stability?


Our kernel and ocfs2 version:
# uname -rvp
2.6.18-274.0.0.0.1.el5 #1 SMP Mon Jul 25 14:33:14 EDT 2011 x86_64

# rpm -qa|grep ocfs2
ocfs2-tools-1.6.3-2.el5
ocfs2-2.6.18-274.0.0.0.1.el5-1.4.8-2.el5
ocfs2console-1.6.3-2.el5



Thank you very much for your help.

Regards,
Marko Sutic
My LinkedIn Profile 
___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users