Re: [Ocfs2-users] null pointer dereference

2012-08-24 Thread Pawel

On 2012-08-22 18:23, srinivas eeda wrote:
crash looks similar to what patch 
https://oss.oracle.com/pipermail/ocfs2-devel/2012-January/008469.html 
trying to address. The fix is not yet accepted because as explained in 
the patch description we need to fix the master node to skip sending 
BAST after receiving unlock message.


regarding ERROR: status = -17 what storage do you use? could be due to 
stale data.

Size of storage is 400G
OCFS2 works over aoe




On 8/22/2012 2:25 AM, Pawel wrote:

It was done multiple times,
even more: system was recreated  by mkfs.
Still the same behavior...


Pawel

On 2012-08-22 04:21, Sunil Mushran wrote:

You may want to run a full fsck on the fs.

fsck.ocfs2 -fy /dev/

On Tue, Aug 21, 2012 at 12:49 AM, Pawel pzl...@mp.pl 
mailto:pzl...@mp.pl wrote:


Hi,
After upgrading ocfs2 my cluster is instable.

At least ones per week I can see:
kernel panic: Null pointer dereference  at 00048
o2dlm_blocking_ast_wrapper + 0x8/0x20 [ocfs2_stack_o2cb]
stack:
dlm_do_local_bast [ocfs2_dlm]
dlm_lookup_lockers [ocfs2_dlm]
dlm_proxy_ast_handler
add_timer
..

After that sometimes deadlock happens on another nodes. Entire
cluster
restart solve the issue.
I see in log:
(dlm_thread,7227,3):dlm_send_proxy_ast_msg:484 ERROR:
ECB9442E19A94EAC896641BFADD55E4B: res
M0001f411c9,
error -107 send AST to node 4
(dlm_thread,7227,3):dlm_flush_asts:605 ERROR: status = -107
o2net: No connection established with node 4 after 10.0 seconds,
giving up.
o2net: No connection established with node 4 after 10.0 seconds,
giving up.
o2net: No connection established with node 4 after 10.0 seconds,
giving up.
(dlm_thread,7227,4):dlm_send_proxy_ast_msg:484 ERROR:
ECB9442E19A94EAC896641BFADD55E4B: res
M0001f411c9,
error -107 send AST to node 4
(dlm_thread,7227,4):dlm_flush_asts:605 ERROR: status = -107
o2cb: o2dlm has evicted node 4 from domain
ECB9442E19A94EAC896641BFADD55E4B
o2cb: o2dlm has evicted node 4 from domain
ECB9442E19A94EAC896641BFADD55E4B
o2dlm: Begin recovery on domain ECB9442E19A94EAC896641BFADD55E4B
for node 4
o2dlm: Node 5 (he) is the Recovery Master for the dead node 4 in
domain
ECB9442E19A94EAC896641BFADD55E4B
o2dlm: End recovery on domain ECB9442E19A94EAC896641BFADD55E4B


Additionaly ~4 times per day I see:

ocfs2_check_dir_for_entry:2119 ERROR: status = -17
ocfs2_mknod:459 ERROR: status = -17
ocfs2_create:629 ERROR: status = -17


I currently use kernel 3.4.2
my filesystem has been created with:
-N 8-b 4096 -C 32768 --fs-features

backup-super,strict-journal-super,sparse,extended-slotmap,inline-data,metaecc,xattr,indexed-dirs,refcount,discontig-bg,unwritten,usrquota,grpquota

Could you tell me what could make my system instable? Which
feature ?

Thanks for any  help

Pawel


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com mailto:Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users






___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users



___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users

[Ocfs2-users] Issue with OCFS2 mount

2012-08-24 Thread Rory Kilkenny
We have an HP P2000 G3 Storage array, fiber connected.  The storage array
has a RAID5 array broken into 2 physical OCFS2 volumes (A  B).

A  B are both mounted and formatted as NTFS.

One of the volumes is NFS mounted.

Every couple of months or so we start getting tons of errors on the NFS
mounted volume:


 Aug 24 09:48:13 FILEt2 kernel: [2234285.848940]
 (ocfs2_wq,13844,7):ocfs2_block_check_validate:443 ERROR: CRC32 failed: stored:
 0, computed 1467126086.  Applying ECC.
 Aug 24 09:48:13 FILEt2 kernel: [2234285.849252]
 (ocfs2_wq,13844,7):ocfs2_block_check_validate:457 ERROR: Fixed CRC32 failed:
 stored: 0, computed 3828104806
 Aug 24 09:48:13 FILEt2 kernel: [2234285.849256]
 (ocfs2_wq,13844,7):ocfs2_validate_extent_block:903 ERROR: Checksum failed for
 extent block 1169089
 Aug 24 09:48:13 FILEt2 kernel: [2234285.849261]
 (ocfs2_wq,13844,7):__ocfs2_find_path:1861 ERROR: status = -5
 Aug 24 09:48:13 FILEt2 kernel: [2234285.849264]
 (ocfs2_wq,13844,7):ocfs2_find_leaf:1958 ERROR: status = -5
 Aug 24 09:48:13 FILEt2 kernel: [2234285.849267]
 (ocfs2_wq,13844,7):ocfs2_find_new_last_ext_blk:6655 ERROR: status = -5
 Aug 24 09:48:13 FILEt2 kernel: [2234285.849270]
 (ocfs2_wq,13844,7):ocfs2_do_truncate:6900 ERROR: status = -5
 Aug 24 09:48:13 FILEt2 kernel: [2234285.849274]
 (ocfs2_wq,13844,7):ocfs2_commit_truncate:7556 ERROR: status = -5
 Aug 24 09:48:13 FILEt2 kernel: [2234285.849280]
 (ocfs2_wq,13844,7):ocfs2_truncate_for_delete:593 ERROR: status = -5
 Aug 24 09:48:13 FILEt2 kernel: [2234285.849284]
 (ocfs2_wq,13844,7):ocfs2_wipe_inode:769 ERROR: status = -5
 Aug 24 09:48:13 FILEt2 kernel: [2234285.849287]
 (ocfs2_wq,13844,7):ocfs2_delete_inode:1067 ERROR: status = -5
 

If we pull all the data off, destroy the volume, rebuilt it, and copy our
data back, all works fine; for a while.

This issue does not happen on the non NFS mounted volume. I am currently
assuming the issue is with NFS and how we have it configured (which to the
best of my knowledge is default).

Has anyone had a similar experience and be able to share some insight and
knowledge on any tricks with NFS and OCFS2 volumes?

Thanks in advance.



___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] Issue with OCFS2 mount

2012-08-24 Thread Sunil Mushran
What is the version of the kernel, ocfs2 and ocfs2 tools?

uname -a
modinfo ocfs2
mkfs.ocfs2 --version

On Fri, Aug 24, 2012 at 1:09 PM, Rory Kilkenny rory.kilke...@ticoon.comwrote:

  We have an HP P2000 G3 Storage array, fiber connected.  The storage
 array has a RAID5 array broken into 2 physical OCFS2 volumes (A  B).

 A  B are both mounted and formatted as NTFS.

 One of the volumes is NFS mounted.

 Every couple of months or so we start getting tons of errors on the NFS
 mounted volume:


 Aug 24 09:48:13 FILEt2 kernel: [2234285.848940]
 (ocfs2_wq,13844,7):ocfs2_block_check_validate:443 ERROR: CRC32 failed:
 stored: 0, computed 1467126086.  Applying ECC.
 Aug 24 09:48:13 FILEt2 kernel: [2234285.849252]
 (ocfs2_wq,13844,7):ocfs2_block_check_validate:457 ERROR: Fixed CRC32
 failed: stored: 0, computed 3828104806
 Aug 24 09:48:13 FILEt2 kernel: [2234285.849256]
 (ocfs2_wq,13844,7):ocfs2_validate_extent_block:903 ERROR: Checksum failed
 for extent block 1169089
 Aug 24 09:48:13 FILEt2 kernel: [2234285.849261]
 (ocfs2_wq,13844,7):__ocfs2_find_path:1861 ERROR: status = -5
 Aug 24 09:48:13 FILEt2 kernel: [2234285.849264]
 (ocfs2_wq,13844,7):ocfs2_find_leaf:1958 ERROR: status = -5
 Aug 24 09:48:13 FILEt2 kernel: [2234285.849267]
 (ocfs2_wq,13844,7):ocfs2_find_new_last_ext_blk:6655 ERROR: status = -5
 Aug 24 09:48:13 FILEt2 kernel: [2234285.849270]
 (ocfs2_wq,13844,7):ocfs2_do_truncate:6900 ERROR: status = -5
 Aug 24 09:48:13 FILEt2 kernel: [2234285.849274]
 (ocfs2_wq,13844,7):ocfs2_commit_truncate:7556 ERROR: status = -5
 Aug 24 09:48:13 FILEt2 kernel: [2234285.849280]
 (ocfs2_wq,13844,7):ocfs2_truncate_for_delete:593 ERROR: status = -5
 Aug 24 09:48:13 FILEt2 kernel: [2234285.849284]
 (ocfs2_wq,13844,7):ocfs2_wipe_inode:769 ERROR: status = -5
 Aug 24 09:48:13 FILEt2 kernel: [2234285.849287]
 (ocfs2_wq,13844,7):ocfs2_delete_inode:1067 ERROR: status = -5


 If we pull all the data off, destroy the volume, rebuilt it, and copy our
 data back, all works fine; for a while.

 This issue does not happen on the non NFS mounted volume. I am currently
 assuming the issue is with NFS and how we have it configured (which to the
 best of my knowledge is default).

 Has anyone had a similar experience and be able to share some insight and
 knowledge on any tricks with NFS and OCFS2 volumes?

 Thanks in advance.



 ___
 Ocfs2-users mailing list
 Ocfs2-users@oss.oracle.com
 https://oss.oracle.com/mailman/listinfo/ocfs2-users

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users