Re: [Ocfs2-users] null pointer dereference
On 2012-08-22 18:23, srinivas eeda wrote: crash looks similar to what patch https://oss.oracle.com/pipermail/ocfs2-devel/2012-January/008469.html trying to address. The fix is not yet accepted because as explained in the patch description we need to fix the master node to skip sending BAST after receiving unlock message. regarding ERROR: status = -17 what storage do you use? could be due to stale data. Size of storage is 400G OCFS2 works over aoe On 8/22/2012 2:25 AM, Pawel wrote: It was done multiple times, even more: system was recreated by mkfs. Still the same behavior... Pawel On 2012-08-22 04:21, Sunil Mushran wrote: You may want to run a full fsck on the fs. fsck.ocfs2 -fy /dev/ On Tue, Aug 21, 2012 at 12:49 AM, Pawel pzl...@mp.pl mailto:pzl...@mp.pl wrote: Hi, After upgrading ocfs2 my cluster is instable. At least ones per week I can see: kernel panic: Null pointer dereference at 00048 o2dlm_blocking_ast_wrapper + 0x8/0x20 [ocfs2_stack_o2cb] stack: dlm_do_local_bast [ocfs2_dlm] dlm_lookup_lockers [ocfs2_dlm] dlm_proxy_ast_handler add_timer .. After that sometimes deadlock happens on another nodes. Entire cluster restart solve the issue. I see in log: (dlm_thread,7227,3):dlm_send_proxy_ast_msg:484 ERROR: ECB9442E19A94EAC896641BFADD55E4B: res M0001f411c9, error -107 send AST to node 4 (dlm_thread,7227,3):dlm_flush_asts:605 ERROR: status = -107 o2net: No connection established with node 4 after 10.0 seconds, giving up. o2net: No connection established with node 4 after 10.0 seconds, giving up. o2net: No connection established with node 4 after 10.0 seconds, giving up. (dlm_thread,7227,4):dlm_send_proxy_ast_msg:484 ERROR: ECB9442E19A94EAC896641BFADD55E4B: res M0001f411c9, error -107 send AST to node 4 (dlm_thread,7227,4):dlm_flush_asts:605 ERROR: status = -107 o2cb: o2dlm has evicted node 4 from domain ECB9442E19A94EAC896641BFADD55E4B o2cb: o2dlm has evicted node 4 from domain ECB9442E19A94EAC896641BFADD55E4B o2dlm: Begin recovery on domain ECB9442E19A94EAC896641BFADD55E4B for node 4 o2dlm: Node 5 (he) is the Recovery Master for the dead node 4 in domain ECB9442E19A94EAC896641BFADD55E4B o2dlm: End recovery on domain ECB9442E19A94EAC896641BFADD55E4B Additionaly ~4 times per day I see: ocfs2_check_dir_for_entry:2119 ERROR: status = -17 ocfs2_mknod:459 ERROR: status = -17 ocfs2_create:629 ERROR: status = -17 I currently use kernel 3.4.2 my filesystem has been created with: -N 8-b 4096 -C 32768 --fs-features backup-super,strict-journal-super,sparse,extended-slotmap,inline-data,metaecc,xattr,indexed-dirs,refcount,discontig-bg,unwritten,usrquota,grpquota Could you tell me what could make my system instable? Which feature ? Thanks for any help Pawel ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com mailto:Ocfs2-users@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users
[Ocfs2-users] Issue with OCFS2 mount
We have an HP P2000 G3 Storage array, fiber connected. The storage array has a RAID5 array broken into 2 physical OCFS2 volumes (A B). A B are both mounted and formatted as NTFS. One of the volumes is NFS mounted. Every couple of months or so we start getting tons of errors on the NFS mounted volume: Aug 24 09:48:13 FILEt2 kernel: [2234285.848940] (ocfs2_wq,13844,7):ocfs2_block_check_validate:443 ERROR: CRC32 failed: stored: 0, computed 1467126086. Applying ECC. Aug 24 09:48:13 FILEt2 kernel: [2234285.849252] (ocfs2_wq,13844,7):ocfs2_block_check_validate:457 ERROR: Fixed CRC32 failed: stored: 0, computed 3828104806 Aug 24 09:48:13 FILEt2 kernel: [2234285.849256] (ocfs2_wq,13844,7):ocfs2_validate_extent_block:903 ERROR: Checksum failed for extent block 1169089 Aug 24 09:48:13 FILEt2 kernel: [2234285.849261] (ocfs2_wq,13844,7):__ocfs2_find_path:1861 ERROR: status = -5 Aug 24 09:48:13 FILEt2 kernel: [2234285.849264] (ocfs2_wq,13844,7):ocfs2_find_leaf:1958 ERROR: status = -5 Aug 24 09:48:13 FILEt2 kernel: [2234285.849267] (ocfs2_wq,13844,7):ocfs2_find_new_last_ext_blk:6655 ERROR: status = -5 Aug 24 09:48:13 FILEt2 kernel: [2234285.849270] (ocfs2_wq,13844,7):ocfs2_do_truncate:6900 ERROR: status = -5 Aug 24 09:48:13 FILEt2 kernel: [2234285.849274] (ocfs2_wq,13844,7):ocfs2_commit_truncate:7556 ERROR: status = -5 Aug 24 09:48:13 FILEt2 kernel: [2234285.849280] (ocfs2_wq,13844,7):ocfs2_truncate_for_delete:593 ERROR: status = -5 Aug 24 09:48:13 FILEt2 kernel: [2234285.849284] (ocfs2_wq,13844,7):ocfs2_wipe_inode:769 ERROR: status = -5 Aug 24 09:48:13 FILEt2 kernel: [2234285.849287] (ocfs2_wq,13844,7):ocfs2_delete_inode:1067 ERROR: status = -5 If we pull all the data off, destroy the volume, rebuilt it, and copy our data back, all works fine; for a while. This issue does not happen on the non NFS mounted volume. I am currently assuming the issue is with NFS and how we have it configured (which to the best of my knowledge is default). Has anyone had a similar experience and be able to share some insight and knowledge on any tricks with NFS and OCFS2 volumes? Thanks in advance. ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Issue with OCFS2 mount
What is the version of the kernel, ocfs2 and ocfs2 tools? uname -a modinfo ocfs2 mkfs.ocfs2 --version On Fri, Aug 24, 2012 at 1:09 PM, Rory Kilkenny rory.kilke...@ticoon.comwrote: We have an HP P2000 G3 Storage array, fiber connected. The storage array has a RAID5 array broken into 2 physical OCFS2 volumes (A B). A B are both mounted and formatted as NTFS. One of the volumes is NFS mounted. Every couple of months or so we start getting tons of errors on the NFS mounted volume: Aug 24 09:48:13 FILEt2 kernel: [2234285.848940] (ocfs2_wq,13844,7):ocfs2_block_check_validate:443 ERROR: CRC32 failed: stored: 0, computed 1467126086. Applying ECC. Aug 24 09:48:13 FILEt2 kernel: [2234285.849252] (ocfs2_wq,13844,7):ocfs2_block_check_validate:457 ERROR: Fixed CRC32 failed: stored: 0, computed 3828104806 Aug 24 09:48:13 FILEt2 kernel: [2234285.849256] (ocfs2_wq,13844,7):ocfs2_validate_extent_block:903 ERROR: Checksum failed for extent block 1169089 Aug 24 09:48:13 FILEt2 kernel: [2234285.849261] (ocfs2_wq,13844,7):__ocfs2_find_path:1861 ERROR: status = -5 Aug 24 09:48:13 FILEt2 kernel: [2234285.849264] (ocfs2_wq,13844,7):ocfs2_find_leaf:1958 ERROR: status = -5 Aug 24 09:48:13 FILEt2 kernel: [2234285.849267] (ocfs2_wq,13844,7):ocfs2_find_new_last_ext_blk:6655 ERROR: status = -5 Aug 24 09:48:13 FILEt2 kernel: [2234285.849270] (ocfs2_wq,13844,7):ocfs2_do_truncate:6900 ERROR: status = -5 Aug 24 09:48:13 FILEt2 kernel: [2234285.849274] (ocfs2_wq,13844,7):ocfs2_commit_truncate:7556 ERROR: status = -5 Aug 24 09:48:13 FILEt2 kernel: [2234285.849280] (ocfs2_wq,13844,7):ocfs2_truncate_for_delete:593 ERROR: status = -5 Aug 24 09:48:13 FILEt2 kernel: [2234285.849284] (ocfs2_wq,13844,7):ocfs2_wipe_inode:769 ERROR: status = -5 Aug 24 09:48:13 FILEt2 kernel: [2234285.849287] (ocfs2_wq,13844,7):ocfs2_delete_inode:1067 ERROR: status = -5 If we pull all the data off, destroy the volume, rebuilt it, and copy our data back, all works fine; for a while. This issue does not happen on the non NFS mounted volume. I am currently assuming the issue is with NFS and how we have it configured (which to the best of my knowledge is default). Has anyone had a similar experience and be able to share some insight and knowledge on any tricks with NFS and OCFS2 volumes? Thanks in advance. ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users