Re: [Ocfs2-users] ocf2 mount point hangs
thanks, still nothing when using find / -iname '*suballoc.*' This code is it: 2398 /* The caller got this descriptor from 2399 * ocfs2_read_group_descriptor(). Any corruption is a code bug. */ 2400 BUG_ON(!OCFS2_IS_VALID_GROUP_DESC(bg)); 2401 2402 trace_ocfs2_block_group_clear_bits(bit_off, num_bits); 2403 2404 BUG_ON(undo_fn && !ocfs2_is_cluster_bitmap(alloc_inode)); 2405 status = ocfs2_journal_access_gd(handle, INODE_CACHE(alloc_inode), 2406 group_bh, 2407 undo_fn ? 2408 OCFS2_JOURNAL_ACCESS_UNDO : 2409 OCFS2_JOURNAL_ACCESS_WRITE); 2410 if (status < 0) { 2411 mlog_errno(status); 2412 goto bail; 2413 } 2414 2415 if (undo_fn) { 2416 jbd_lock_bh_state(group_bh); 2417 undo_bg = (struct ocfs2_group_desc *) 2418 bh2jh(group_bh)->b_committed_data; 2419 BUG_ON(!undo_bg); 2420 } 2421 2422 tmp = num_bits; 2423 while(tmp--) { 2424 ocfs2_clear_bit((bit_off + tmp), 2425 (unsigned long *) bg->bg_bitmap); 2426 if (undo_fn) 2427 undo_fn(bit_off + tmp, 2428 (unsigned long *) undo_bg->bg_bitmap); 2429 } 2430 le16_add_cpu(>bg_free_bits_count, num_bits); 2431 if (le16_to_cpu(bg->bg_free_bits_count) > le16_to_cpu(bg->bg_bits)) { 2432 ocfs2_error(alloc_inode->i_sb, "Group descriptor # %llu has bit" 2433 " count %u but claims %u are freed. num_bits %d", 2434 (unsigned long long)le64_to_cpu(bg->bg_blkno), 2435 le16_to_cpu(bg->bg_bits), 2436 le16_to_cpu(bg->bg_free_bits_count), num_bits); 2437 return -EROFS; 2438 } 2439 On Wed, Sep 14, 2016 at 1:52 PM, Werner Flammewrote: > Ishmael Tsoaela [14.09.2016 13:43]: >> thanks for the response. I actually downloaded the source code for >> kernel version 4.2.0 I am on the same version as: >> >> # uname -r >> 4.2.0-27-generic >> >> wget >> http://archive.ubuntu.com/ubuntu/pool/main/l/linux/linux_4.2.0.orig.tar.gz >> tar xvf linux_4.2.0.orig.tar.gz >> >> cd /home/ishmael/linux-4.2/fs/ocfs2 >> >> >> I found the suballoc.c in there. >> >> >> I was not able to find the code on the OS itself >> >> root@nodeB:/# find / -iname *suballoc.* > > Better use # find / -iname '*suballoc.*', so that your shell will not > make the standard replacements on * :) > >> >> Will the code in the last email suffice? > > I wouldn't know until you tell us that the code in your file is the same > as you posted :) > > Werner > > -- > > > > ___ > Ocfs2-users mailing list > Ocfs2-users@oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] ocf2 mount point hangs
Hi eric, I found the code below from archive.ubuntu.com/ubuntu/pool/main/l/linux/fs/ocfs2/suballoc.c 2410 if (status < 0) { 2411 mlog_errno(status); 2412 goto bail; 2413 } 2414 2415 if (undo_fn) { 2416 jbd_lock_bh_state(group_bh); 2417 undo_bg = (struct ocfs2_group_desc *) 2418 bh2jh(group_bh)->b_committed_data; 2419 BUG_ON(!undo_bg); 2420 } 2421 2422 tmp = num_bits; 2423 while(tmp--) { 2424 ocfs2_clear_bit((bit_off + tmp), 2425 (unsigned long *) bg->bg_bitmap); 2426 if (undo_fn) 2427 undo_fn(bit_off + tmp, 2428 (unsigned long *) undo_bg->bg_bitmap); 2429 } 2430 le16_add_cpu(>bg_free_bits_count, num_bits); 2431 if (le16_to_cpu(bg->bg_free_bits_count) > le16_to_cpu(bg->bg_bits)) { 2432 ocfs2_error(alloc_inode->i_sb, "Group descriptor # %llu has bit" 2433 " count %u but claims %u are freed. num_bits %d", 2434 (unsigned long long)le64_to_cpu(bg->bg_blkno), 2435 le16_to_cpu(bg->bg_bits), 2436 le16_to_cpu(bg->bg_free_bits_count), num_bits); 2437 return -EROFS; 2438 } On Wed, Sep 14, 2016 at 10:13 AM, Eric Renwrote: > Hi, > > On 09/14/2016 02:30 PM, Ishmael Tsoaela wrote: >> >> Hi Eric, >> >> Could you paste the code context around this line? >> Sep 13 08:10:18 nodeB kernel: [1104431.300882] kernel BUG at >> >> /build/linux-lts-wily-Vv6Eyd/linux-lts-wily-4.2.0/fs/ocfs2/suballoc.c:2419! > > This message is very import because it shows exactly which line of the > source code > directly results in this BUG() output. What I want you do is to paste out > the code around #2419 > of suballoc.c. Such I can locate where the BUG() is locally because the code > of line#2419 is different > with different code version. >> >> >> Apologies but I tried to understand this but failed >> >> >> root@nodeB:~# echo w > /proc/sysrq-trigger >> root@nodeB:~# >> >> Node reboot and mount points are accessble from all 3 nodes, not sure >> why but it seems it will be difficult to figure out what went wrong >> with ocfs2 without proper knowledge, so let me not waste any of your >> time, let me figure out 'crash`[1][2] or gdb' then hopefully when it >> happens next time I would have much better understanding > > OK, good luck! > > > Eric >> >> >> On Tue, Sep 13, 2016 at 11:44 AM, Eric Ren wrote: >>> >>> On 09/13/2016 05:01 PM, Ishmael Tsoaela wrote: Hi Eric, Sorry Here are the other 2 syslogs if you need and debug output >>> >>> According to the logs, the nodeB should be the first one that got >>> problem. >>> >>> Could you paste the code context around this line? >>> Sep 13 08:10:18 nodeB kernel: [1104431.300882] kernel BUG at >>> >>> /build/linux-lts-wily-Vv6Eyd/linux-lts-wily-4.2.0/fs/ocfs2/suballoc.c:2419! The request in the snip attached just hangs >>> >>> NodeB should have taken this exclusive cluster lock, so any commands >>> trying >>> to access that file will hang up. >>> >>> Could you provide the output of `echo w > /proc/sysrq-trigger`? OCFS2 >>> issue >>> is not easy to debug if developer cannot reproduce >>> it locally, and this is the case. BTW, you can narrow down by >>> `crash`[1][2] >>> or gdb if you have some knowledge of kernel stuff. >>> >>> [1] http://www.dedoimedo.com/computers/crash-analyze.html >>> [2] https://people.redhat.com/anderson/crash_whitepaper/ >>> >>> Eric >>> On Tue, Sep 13, 2016 at 10:37 AM, Ishmael Tsoaela wrote: > > Thanks for the response > > > 1. the disk is a shared ceph rbd device > >#rbd showmapped > id poolimage snap device > 1 vmimagesblock_vmimages-/dev/rbd1 > > > 2. ocfs2 has been working well for 2 months now, with a reboot 12 days > ago > > 3. 3 ceph nodes all have rbd image mapped and ocfs3 mounted > > commands used > > #sudo rbd map block_vmimages --pool vmimages --name > > #sudo mount /dev/rbd/vmimages/block_vmimages /mnt/vmimages/ > /dev/rbd1 > > 4. > root@nodeC:~# sudo debugfs.ocfs2 -R stats /dev/rbd1 > Revision: 0.90 > Mount Count: 0 Max Mount Count: 20 > State: 0 Errors: 0 > Check Interval: 0 Last Check: Tue Aug 2 15:41:12 2016 > Creator OS: 0 > Feature Compat: 3 backup-super strict-journal-super > Feature Incompat: 592 sparse inline-data xattr > Tunefs Incomplete: 0 > Feature RO compat: 1 unwritten
Re: [Ocfs2-users] ocf2 mount point hangs
Hi, On 09/14/2016 02:30 PM, Ishmael Tsoaela wrote: > Hi Eric, > > Could you paste the code context around this line? > Sep 13 08:10:18 nodeB kernel: [1104431.300882] kernel BUG at > /build/linux-lts-wily-Vv6Eyd/linux-lts-wily-4.2.0/fs/ocfs2/suballoc.c:2419! This message is very import because it shows exactly which line of the source code directly results in this BUG() output. What I want you do is to paste out the code around #2419 of suballoc.c. Such I can locate where the BUG() is locally because the code of line#2419 is different with different code version. > > Apologies but I tried to understand this but failed > > > root@nodeB:~# echo w > /proc/sysrq-trigger > root@nodeB:~# > > Node reboot and mount points are accessble from all 3 nodes, not sure > why but it seems it will be difficult to figure out what went wrong > with ocfs2 without proper knowledge, so let me not waste any of your > time, let me figure out 'crash`[1][2] or gdb' then hopefully when it > happens next time I would have much better understanding OK, good luck! Eric > > On Tue, Sep 13, 2016 at 11:44 AM, Eric Renwrote: >> On 09/13/2016 05:01 PM, Ishmael Tsoaela wrote: >>> Hi Eric, >>> >>> Sorry Here are the other 2 syslogs if you need and debug output >> According to the logs, the nodeB should be the first one that got problem. >> >> Could you paste the code context around this line? >> Sep 13 08:10:18 nodeB kernel: [1104431.300882] kernel BUG at >> /build/linux-lts-wily-Vv6Eyd/linux-lts-wily-4.2.0/fs/ocfs2/suballoc.c:2419! >>> The request in the snip attached just hangs >> NodeB should have taken this exclusive cluster lock, so any commands trying >> to access that file will hang up. >> >> Could you provide the output of `echo w > /proc/sysrq-trigger`? OCFS2 issue >> is not easy to debug if developer cannot reproduce >> it locally, and this is the case. BTW, you can narrow down by `crash`[1][2] >> or gdb if you have some knowledge of kernel stuff. >> >> [1] http://www.dedoimedo.com/computers/crash-analyze.html >> [2] https://people.redhat.com/anderson/crash_whitepaper/ >> >> Eric >> >>> >>> >>> >>> >>> >>> >>> >>> On Tue, Sep 13, 2016 at 10:37 AM, Ishmael Tsoaela >>> wrote: Thanks for the response 1. the disk is a shared ceph rbd device #rbd showmapped id poolimage snap device 1 vmimagesblock_vmimages-/dev/rbd1 2. ocfs2 has been working well for 2 months now, with a reboot 12 days ago 3. 3 ceph nodes all have rbd image mapped and ocfs3 mounted commands used #sudo rbd map block_vmimages --pool vmimages --name #sudo mount /dev/rbd/vmimages/block_vmimages /mnt/vmimages/ /dev/rbd1 4. root@nodeC:~# sudo debugfs.ocfs2 -R stats /dev/rbd1 Revision: 0.90 Mount Count: 0 Max Mount Count: 20 State: 0 Errors: 0 Check Interval: 0 Last Check: Tue Aug 2 15:41:12 2016 Creator OS: 0 Feature Compat: 3 backup-super strict-journal-super Feature Incompat: 592 sparse inline-data xattr Tunefs Incomplete: 0 Feature RO compat: 1 unwritten Root Blknum: 5 System Dir Blknum: 6 First Cluster Group Blknum: 3 Block Size Bits: 12 Cluster Size Bits: 12 Max Node Slots: 16 Extended Attributes Inline Size: 256 Label: UUID: 238F878003E7455FA5B01CC884D1047F Hash: 919897149 (0x36d4843d) DX Seed[0]: 0x DX Seed[1]: 0x DX Seed[2]: 0x Cluster stack: classic o2cb Inode: 2 Mode: 00 Generation: 1754092981 (0x688d55b5) FS Generation: 1754092981 (0x688d55b5) CRC32: ECC: Type: Unknown Attr: 0x0 Flags: Valid System Superblock Dynamic Features: (0x0) User: 0 (root) Group: 0 (root) Size: 0 Links: 0 Clusters: 64000 ctime: 0x57a0a2f8 -- Tue Aug 2 15:41:12 2016 atime: 0x0 -- Thu Jan 1 02:00:00 1970 mtime: 0x57a0a2f8 -- Tue Aug 2 15:41:12 2016 dtime: 0x0 -- Thu Jan 1 02:00:00 1970 ctime_nsec: 0x -- 0 atime_nsec: 0x -- 0 mtime_nsec: 0x -- 0 Refcount Block: 0 Last Extblk: 0 Orphan Slot: 0 Sub Alloc Slot: Global Sub Alloc Bit: 65535 thanks for the assistance On Tue, Sep 13, 2016 at 10:23 AM, Eric Ren wrote: > Hi, > > On 09/13/2016 03:16 PM, Ishmael Tsoaela wrote: >> Hi All, >> >> I have an ocfs2 mount point of 3 ceph cluster nodes
Re: [Ocfs2-users] ocf2 mount point hangs
Hi Eric, Could you paste the code context around this line? Sep 13 08:10:18 nodeB kernel: [1104431.300882] kernel BUG at /build/linux-lts-wily-Vv6Eyd/linux-lts-wily-4.2.0/fs/ocfs2/suballoc.c:2419! Apologies but I tried to understand this but failed root@nodeB:~# echo w > /proc/sysrq-trigger root@nodeB:~# Node reboot and mount points are accessble from all 3 nodes, not sure why but it seems it will be difficult to figure out what went wrong with ocfs2 without proper knowledge, so let me not waste any of your time, let me figure out 'crash`[1][2] or gdb' then hopefully when it happens next time I would have much better understanding On Tue, Sep 13, 2016 at 11:44 AM, Eric Renwrote: > On 09/13/2016 05:01 PM, Ishmael Tsoaela wrote: >> >> Hi Eric, >> >> Sorry Here are the other 2 syslogs if you need and debug output > > According to the logs, the nodeB should be the first one that got problem. > > Could you paste the code context around this line? >Sep 13 08:10:18 nodeB kernel: [1104431.300882] kernel BUG at > /build/linux-lts-wily-Vv6Eyd/linux-lts-wily-4.2.0/fs/ocfs2/suballoc.c:2419! >> >> The request in the snip attached just hangs > > NodeB should have taken this exclusive cluster lock, so any commands trying > to access that file will hang up. > > Could you provide the output of `echo w > /proc/sysrq-trigger`? OCFS2 issue > is not easy to debug if developer cannot reproduce > it locally, and this is the case. BTW, you can narrow down by `crash`[1][2] > or gdb if you have some knowledge of kernel stuff. > > [1] http://www.dedoimedo.com/computers/crash-analyze.html > [2] https://people.redhat.com/anderson/crash_whitepaper/ > > Eric > >> >> >> >> >> >> >> >> >> On Tue, Sep 13, 2016 at 10:37 AM, Ishmael Tsoaela >> wrote: >>> >>> Thanks for the response >>> >>> >>> 1. the disk is a shared ceph rbd device >>> >>> #rbd showmapped >>> id poolimage snap device >>> 1 vmimagesblock_vmimages-/dev/rbd1 >>> >>> >>> 2. ocfs2 has been working well for 2 months now, with a reboot 12 days >>> ago >>> >>> 3. 3 ceph nodes all have rbd image mapped and ocfs3 mounted >>> >>> commands used >>> >>> #sudo rbd map block_vmimages --pool vmimages --name >>> >>> #sudo mount /dev/rbd/vmimages/block_vmimages /mnt/vmimages/ >>> /dev/rbd1 >>> >>> 4. >>> root@nodeC:~# sudo debugfs.ocfs2 -R stats /dev/rbd1 >>> Revision: 0.90 >>> Mount Count: 0 Max Mount Count: 20 >>> State: 0 Errors: 0 >>> Check Interval: 0 Last Check: Tue Aug 2 15:41:12 2016 >>> Creator OS: 0 >>> Feature Compat: 3 backup-super strict-journal-super >>> Feature Incompat: 592 sparse inline-data xattr >>> Tunefs Incomplete: 0 >>> Feature RO compat: 1 unwritten >>> Root Blknum: 5 System Dir Blknum: 6 >>> First Cluster Group Blknum: 3 >>> Block Size Bits: 12 Cluster Size Bits: 12 >>> Max Node Slots: 16 >>> Extended Attributes Inline Size: 256 >>> Label: >>> UUID: 238F878003E7455FA5B01CC884D1047F >>> Hash: 919897149 (0x36d4843d) >>> DX Seed[0]: 0x >>> DX Seed[1]: 0x >>> DX Seed[2]: 0x >>> Cluster stack: classic o2cb >>> Inode: 2 Mode: 00 Generation: 1754092981 (0x688d55b5) >>> FS Generation: 1754092981 (0x688d55b5) >>> CRC32: ECC: >>> Type: Unknown Attr: 0x0 Flags: Valid System Superblock >>> Dynamic Features: (0x0) >>> User: 0 (root) Group: 0 (root) Size: 0 >>> Links: 0 Clusters: 64000 >>> ctime: 0x57a0a2f8 -- Tue Aug 2 15:41:12 2016 >>> atime: 0x0 -- Thu Jan 1 02:00:00 1970 >>> mtime: 0x57a0a2f8 -- Tue Aug 2 15:41:12 2016 >>> dtime: 0x0 -- Thu Jan 1 02:00:00 1970 >>> ctime_nsec: 0x -- 0 >>> atime_nsec: 0x -- 0 >>> mtime_nsec: 0x -- 0 >>> Refcount Block: 0 >>> Last Extblk: 0 Orphan Slot: 0 >>> Sub Alloc Slot: Global Sub Alloc Bit: 65535 >>> >>> >>> >>> thanks for the assistance >>> >>> >>> On Tue, Sep 13, 2016 at 10:23 AM, Eric Ren wrote: Hi, On 09/13/2016 03:16 PM, Ishmael Tsoaela wrote: > > Hi All, > > I have an ocfs2 mount point of 3 ceph cluster nodes and suddenly I > cannot read and write to the mount point although the cluster is clean > and showing no errors. 1. What is your ocfs2 shared disk? I mean it's a shared disk exported by iscsi target, or a ceph rbd device? 2. Did you check if ocfs2 works well before any read/write? and how? 3. Could you elaborating more details how the ceph nodes use ocfs2? 4. Please provide the output of: #sudo debugfs.ocfs2 -R stats /dev/sda > > > > Are the any other logs I can check?
Re: [Ocfs2-users] ocf2 mount point hangs
On 09/13/2016 05:01 PM, Ishmael Tsoaela wrote: > Hi Eric, > > Sorry Here are the other 2 syslogs if you need and debug output According to the logs, the nodeB should be the first one that got problem. Could you paste the code context around this line? Sep 13 08:10:18 nodeB kernel: [1104431.300882] kernel BUG at /build/linux-lts-wily-Vv6Eyd/linux-lts-wily-4.2.0/fs/ocfs2/suballoc.c:2419! > The request in the snip attached just hangs NodeB should have taken this exclusive cluster lock, so any commands trying to access that file will hang up. Could you provide the output of `echo w > /proc/sysrq-trigger`? OCFS2 issue is not easy to debug if developer cannot reproduce it locally, and this is the case. BTW, you can narrow down by `crash`[1][2] or gdb if you have some knowledge of kernel stuff. [1] http://www.dedoimedo.com/computers/crash-analyze.html [2] https://people.redhat.com/anderson/crash_whitepaper/ Eric > > > > > > > > > On Tue, Sep 13, 2016 at 10:37 AM, Ishmael Tsoaelawrote: >> Thanks for the response >> >> >> 1. the disk is a shared ceph rbd device >> >> #rbd showmapped >> id poolimage snap device >> 1 vmimagesblock_vmimages-/dev/rbd1 >> >> >> 2. ocfs2 has been working well for 2 months now, with a reboot 12 days ago >> >> 3. 3 ceph nodes all have rbd image mapped and ocfs3 mounted >> >> commands used >> >> #sudo rbd map block_vmimages --pool vmimages --name >> >> #sudo mount /dev/rbd/vmimages/block_vmimages /mnt/vmimages/ >> /dev/rbd1 >> >> 4. >> root@nodeC:~# sudo debugfs.ocfs2 -R stats /dev/rbd1 >> Revision: 0.90 >> Mount Count: 0 Max Mount Count: 20 >> State: 0 Errors: 0 >> Check Interval: 0 Last Check: Tue Aug 2 15:41:12 2016 >> Creator OS: 0 >> Feature Compat: 3 backup-super strict-journal-super >> Feature Incompat: 592 sparse inline-data xattr >> Tunefs Incomplete: 0 >> Feature RO compat: 1 unwritten >> Root Blknum: 5 System Dir Blknum: 6 >> First Cluster Group Blknum: 3 >> Block Size Bits: 12 Cluster Size Bits: 12 >> Max Node Slots: 16 >> Extended Attributes Inline Size: 256 >> Label: >> UUID: 238F878003E7455FA5B01CC884D1047F >> Hash: 919897149 (0x36d4843d) >> DX Seed[0]: 0x >> DX Seed[1]: 0x >> DX Seed[2]: 0x >> Cluster stack: classic o2cb >> Inode: 2 Mode: 00 Generation: 1754092981 (0x688d55b5) >> FS Generation: 1754092981 (0x688d55b5) >> CRC32: ECC: >> Type: Unknown Attr: 0x0 Flags: Valid System Superblock >> Dynamic Features: (0x0) >> User: 0 (root) Group: 0 (root) Size: 0 >> Links: 0 Clusters: 64000 >> ctime: 0x57a0a2f8 -- Tue Aug 2 15:41:12 2016 >> atime: 0x0 -- Thu Jan 1 02:00:00 1970 >> mtime: 0x57a0a2f8 -- Tue Aug 2 15:41:12 2016 >> dtime: 0x0 -- Thu Jan 1 02:00:00 1970 >> ctime_nsec: 0x -- 0 >> atime_nsec: 0x -- 0 >> mtime_nsec: 0x -- 0 >> Refcount Block: 0 >> Last Extblk: 0 Orphan Slot: 0 >> Sub Alloc Slot: Global Sub Alloc Bit: 65535 >> >> >> >> thanks for the assistance >> >> >> On Tue, Sep 13, 2016 at 10:23 AM, Eric Ren wrote: >>> Hi, >>> >>> On 09/13/2016 03:16 PM, Ishmael Tsoaela wrote: Hi All, I have an ocfs2 mount point of 3 ceph cluster nodes and suddenly I cannot read and write to the mount point although the cluster is clean and showing no errors. >>> 1. What is your ocfs2 shared disk? I mean it's a shared disk exported by >>> iscsi target, or a ceph rbd device? >>> 2. Did you check if ocfs2 works well before any read/write? and how? >>> 3. Could you elaborating more details how the ceph nodes use ocfs2? >>> 4. Please provide the output of: >>> #sudo debugfs.ocfs2 -R stats /dev/sda Are the any other logs I can check? >>> All log messages should go to /var/log/messages, could you attach the whole >>> log file? >>> >>> Eric There are some log in kern.log about kern.log Sep 13 08:10:18 nodeB kernel: [1104431.300882] kernel BUG at /build/linux-lts-wily-Vv6Eyd/linux-lts-wily-4.2.0/fs/ocfs2/suballoc.c:2419! Sep 13 08:10:18 nodeB kernel: [1104431.345504] invalid opcode: [#1] SMP Sep 13 08:10:18 nodeB kernel: [1104431.370081] Modules linked in: vhost_net vhost macvtap macvlan ocfs2 quota_tree rbd libceph ipmi_si mpt3sas mpt2sas raid_class scsi_transport_sas mptctl mptbase xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ebtable_filter ebtables
Re: [Ocfs2-users] ocf2 mount point hangs
Hi Eric, Sorry Here are the other 2 syslogs if you need and debug output nodeD root@nodeD:~# sudo debugfs.ocfs2 -R stats /dev/rbd1 Revision: 0.90 Mount Count: 0 Max Mount Count: 20 State: 0 Errors: 0 Check Interval: 0 Last Check: Tue Aug 2 15:41:12 2016 Creator OS: 0 Feature Compat: 3 backup-super strict-journal-super Feature Incompat: 592 sparse inline-data xattr Tunefs Incomplete: 0 Feature RO compat: 1 unwritten Root Blknum: 5 System Dir Blknum: 6 First Cluster Group Blknum: 3 Block Size Bits: 12 Cluster Size Bits: 12 Max Node Slots: 16 Extended Attributes Inline Size: 256 Label: UUID: 238F878003E7455FA5B01CC884D1047F Hash: 919897149 (0x36d4843d) DX Seed[0]: 0x DX Seed[1]: 0x DX Seed[2]: 0x Cluster stack: classic o2cb Inode: 2 Mode: 00 Generation: 1754092981 (0x688d55b5) FS Generation: 1754092981 (0x688d55b5) CRC32: ECC: Type: Unknown Attr: 0x0 Flags: Valid System Superblock Dynamic Features: (0x0) User: 0 (root) Group: 0 (root) Size: 0 Links: 0 Clusters: 64000 ctime: 0x57a0a2f8 -- Tue Aug 2 15:41:12 2016 atime: 0x0 -- Thu Jan 1 02:00:00 1970 mtime: 0x57a0a2f8 -- Tue Aug 2 15:41:12 2016 dtime: 0x0 -- Thu Jan 1 02:00:00 1970 ctime_nsec: 0x -- 0 atime_nsec: 0x -- 0 mtime_nsec: 0x -- 0 Refcount Block: 0 Last Extblk: 0 Orphan Slot: 0 Sub Alloc Slot: Global Sub Alloc Bit: 65535 nodeB root@nodeB:~# sudo debugfs.ocfs2 -R stats /dev/rbd1 Revision: 0.90 Mount Count: 0 Max Mount Count: 20 State: 0 Errors: 0 Check Interval: 0 Last Check: Tue Aug 2 15:41:12 2016 Creator OS: 0 Feature Compat: 3 backup-super strict-journal-super Feature Incompat: 592 sparse inline-data xattr Tunefs Incomplete: 0 Feature RO compat: 1 unwritten Root Blknum: 5 System Dir Blknum: 6 First Cluster Group Blknum: 3 Block Size Bits: 12 Cluster Size Bits: 12 Max Node Slots: 16 Extended Attributes Inline Size: 256 Label: UUID: 238F878003E7455FA5B01CC884D1047F Hash: 919897149 (0x36d4843d) DX Seed[0]: 0x DX Seed[1]: 0x DX Seed[2]: 0x Cluster stack: classic o2cb Inode: 2 Mode: 00 Generation: 1754092981 (0x688d55b5) FS Generation: 1754092981 (0x688d55b5) CRC32: ECC: Type: Unknown Attr: 0x0 Flags: Valid System Superblock Dynamic Features: (0x0) User: 0 (root) Group: 0 (root) Size: 0 Links: 0 Clusters: 64000 ctime: 0x57a0a2f8 -- Tue Aug 2 15:41:12 2016 atime: 0x0 -- Thu Jan 1 02:00:00 1970 mtime: 0x57a0a2f8 -- Tue Aug 2 15:41:12 2016 dtime: 0x0 -- Thu Jan 1 02:00:00 1970 ctime_nsec: 0x -- 0 atime_nsec: 0x -- 0 mtime_nsec: 0x -- 0 Refcount Block: 0 Last Extblk: 0 Orphan Slot: 0 Sub Alloc Slot: Global Sub Alloc Bit: 65535 The request in the snip attached just hangs On Tue, Sep 13, 2016 at 10:37 AM, Ishmael Tsoaelawrote: > Thanks for the response > > > 1. the disk is a shared ceph rbd device > > #rbd showmapped > id poolimage snap device > 1 vmimagesblock_vmimages-/dev/rbd1 > > > 2. ocfs2 has been working well for 2 months now, with a reboot 12 days ago > > 3. 3 ceph nodes all have rbd image mapped and ocfs3 mounted > > commands used > > #sudo rbd map block_vmimages --pool vmimages --name > > #sudo mount /dev/rbd/vmimages/block_vmimages /mnt/vmimages/ > /dev/rbd1 > > 4. > root@nodeC:~# sudo debugfs.ocfs2 -R stats /dev/rbd1 > Revision: 0.90 > Mount Count: 0 Max Mount Count: 20 > State: 0 Errors: 0 > Check Interval: 0 Last Check: Tue Aug 2 15:41:12 2016 > Creator OS: 0 > Feature Compat: 3 backup-super strict-journal-super > Feature Incompat: 592 sparse inline-data xattr > Tunefs Incomplete: 0 > Feature RO compat: 1 unwritten > Root Blknum: 5 System Dir Blknum: 6 > First Cluster Group Blknum: 3 > Block Size Bits: 12 Cluster Size Bits: 12 > Max Node Slots: 16 > Extended Attributes Inline Size: 256 > Label: > UUID: 238F878003E7455FA5B01CC884D1047F > Hash: 919897149 (0x36d4843d) > DX Seed[0]: 0x > DX Seed[1]: 0x > DX Seed[2]: 0x > Cluster stack: classic o2cb > Inode: 2 Mode: 00 Generation: 1754092981 (0x688d55b5) > FS Generation: 1754092981
Re: [Ocfs2-users] ocf2 mount point hangs
Hi, On 09/13/2016 03:16 PM, Ishmael Tsoaela wrote: > Hi All, > > I have an ocfs2 mount point of 3 ceph cluster nodes and suddenly I > cannot read and write to the mount point although the cluster is clean > and showing no errors. 1. What is your ocfs2 shared disk? I mean it's a shared disk exported by iscsi target, or a ceph rbd device? 2. Did you check if ocfs2 works well before any read/write? and how? 3. Could you elaborating more details how the ceph nodes use ocfs2? 4. Please provide the output of: #sudo debugfs.ocfs2 -R stats /dev/sda > > > Are the any other logs I can check? All log messages should go to /var/log/messages, could you attach the whole log file? Eric > > There are some log in kern.log about > > > kern.log > > Sep 13 08:10:18 nodeB kernel: [1104431.300882] kernel BUG at > /build/linux-lts-wily-Vv6Eyd/linux-lts-wily-4.2.0/fs/ocfs2/suballoc.c:2419! > Sep 13 08:10:18 nodeB kernel: [1104431.345504] invalid opcode: [#1] SMP > Sep 13 08:10:18 nodeB kernel: [1104431.370081] Modules linked in: > vhost_net vhost macvtap macvlan ocfs2 quota_tree rbd libceph ipmi_si > mpt3sas mpt2sas raid_class scsi_transport_sas mptctl mptbase > xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 > iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 > xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp > ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter > ip_tables x_tables dell_rbu ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm > ocfs2_nodemanager ocfs2_stackglue configfs bridge stp llc binfmt_misc > ipmi_devintf kvm_amd dcdbas kvm input_leds joydev amd64_edac_mod > crct10dif_pclmul edac_core shpchp i2c_piix4 fam15h_power crc32_pclmul > edac_mce_amd ipmi_ssif k10temp aesni_intel aes_x86_64 lrw gf128mul > 8250_fintek glue_helper acpi_power_meter mac_hid serio_raw ablk_helper > cryptd ipmi_msghandler xfs libcrc32c lp parport ixgbe dca hid_generic > uas usbhid vxlan usb_storage ip6_udp_tunnel hid udp_tunnel ptp psmouse > bnx2 pps_core megaraid_sas mdio [last unloaded: ipmi_si] > Sep 13 08:10:18 nodeB kernel: [1104431.898986] CPU: 10 PID: 65016 > Comm: cp Not tainted 4.2.0-27-generic #32~14.04.1-Ubuntu > Sep 13 08:10:18 nodeB kernel: [1104432.012469] Hardware name: Dell > Inc. PowerEdge R515/0RMRF7, BIOS 2.0.2 10/22/2012 > Sep 13 08:10:18 nodeB kernel: [1104432.134659] task: 880a61dca940 > ti: 88084a5ac000 task.ti: 88084a5ac000 > Sep 13 08:10:18 nodeB kernel: [1104432.265260] RIP: > 0010:[] [] > _ocfs2_free_suballoc_bits+0x4db/0x4e0 [ocfs2] > Sep 13 08:10:18 nodeB kernel: [1104432.406559] RSP: > 0018:88084a5af798 EFLAGS: 00010246 > Sep 13 08:10:18 nodeB kernel: [1104432.479958] RAX: > RBX: 881acebcb000 RCX: 881fcd372e00 > Sep 13 08:10:18 nodeB kernel: [1104432.630768] RDX: 881fd0d4dc30 > RSI: 88197e351bc8 RDI: 880fd127b2b0 > Sep 13 08:10:18 nodeB kernel: [1104432.789688] RBP: 88084a5af818 > R08: 0002 R09: 7e00 > Sep 13 08:10:18 nodeB kernel: [1104432.950053] R10: 880d39a21020 > R11: 88084a5af550 R12: 00fa > Sep 13 08:10:18 nodeB kernel: [1104433.113014] R13: 5ab1 > R14: R15: 880fb2d43000 > Sep 13 08:10:18 nodeB kernel: [1104433.276484] FS: > 7fcc68373840() GS:881fdde8() > knlGS: > Sep 13 08:10:18 nodeB kernel: [1104433.440016] CS: 0010 DS: ES: > CR0: 8005003b > Sep 13 08:10:18 nodeB kernel: [1104433.521496] CR2: 5647b2ee6d80 > CR3: 000198b93000 CR4: 000406e0 > Sep 13 08:10:18 nodeB kernel: [1104433.681357] Stack: > Sep 13 08:10:18 nodeB kernel: [1104433.758498] > 880fd127b2e8 881fc6655f08 5bab > Sep 13 08:10:18 nodeB kernel: [1104433.913655] 881fd0c51d80 > 88197e351bc8 880fd127b330 880e9eaa6000 > Sep 13 08:10:18 nodeB kernel: [1104434.068609] 88197e351bc8 > 817ba6d6 0001 1ac592b1 > Sep 13 08:10:18 nodeB kernel: [1104434.223347] Call Trace: > Sep 13 08:10:18 nodeB kernel: [1104434.298560] [] ? > mutex_lock+0x16/0x37 > Sep 13 08:10:18 nodeB kernel: [1104434.374183] [] > _ocfs2_free_clusters+0xea/0x200 [ocfs2] > Sep 13 08:10:18 nodeB kernel: [1104434.449628] [] ? > ocfs2_put_slot+0xe0/0xe0 [ocfs2] > Sep 13 08:10:18 nodeB kernel: [1104434.523971] [] ? > ocfs2_put_slot+0xe0/0xe0 [ocfs2] > Sep 13 08:10:18 nodeB kernel: [1104434.595803] [] > ocfs2_free_clusters+0x15/0x20 [ocfs2] > Sep 13 08:10:18 nodeB kernel: [1104434.14] [] > __ocfs2_flush_truncate_log+0x247/0x560 [ocfs2] > Sep 13 08:10:18 nodeB kernel: [1104434.806017] [] ? > ocfs2_num_free_extents+0x56/0x120 [ocfs2] > Sep 13 08:10:18 nodeB kernel: [1104434.946141] [] > ocfs2_remove_btree_range+0x4e8/0x760 [ocfs2] > Sep 13 08:10:18 nodeB kernel: [1104435.086490] [] > ocfs2_commit_truncate+0x180/0x590 [ocfs2] > Sep 13 08:10:18 nodeB kernel: [1104435.158189] [] ? > ocfs2_allocate_extend_trans+0x130/0x130
[Ocfs2-users] ocf2 mount point hangs
Hi All, I have an ocfs2 mount point of 3 ceph cluster nodes and suddenly I cannot read and write to the mount point although the cluster is clean and showing no errors. Are the any other logs I can check? There are some log in kern.log about kern.log Sep 13 08:10:18 nodeB kernel: [1104431.300882] kernel BUG at /build/linux-lts-wily-Vv6Eyd/linux-lts-wily-4.2.0/fs/ocfs2/suballoc.c:2419! Sep 13 08:10:18 nodeB kernel: [1104431.345504] invalid opcode: [#1] SMP Sep 13 08:10:18 nodeB kernel: [1104431.370081] Modules linked in: vhost_net vhost macvtap macvlan ocfs2 quota_tree rbd libceph ipmi_si mpt3sas mpt2sas raid_class scsi_transport_sas mptctl mptbase xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables dell_rbu ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs bridge stp llc binfmt_misc ipmi_devintf kvm_amd dcdbas kvm input_leds joydev amd64_edac_mod crct10dif_pclmul edac_core shpchp i2c_piix4 fam15h_power crc32_pclmul edac_mce_amd ipmi_ssif k10temp aesni_intel aes_x86_64 lrw gf128mul 8250_fintek glue_helper acpi_power_meter mac_hid serio_raw ablk_helper cryptd ipmi_msghandler xfs libcrc32c lp parport ixgbe dca hid_generic uas usbhid vxlan usb_storage ip6_udp_tunnel hid udp_tunnel ptp psmouse bnx2 pps_core megaraid_sas mdio [last unloaded: ipmi_si] Sep 13 08:10:18 nodeB kernel: [1104431.898986] CPU: 10 PID: 65016 Comm: cp Not tainted 4.2.0-27-generic #32~14.04.1-Ubuntu Sep 13 08:10:18 nodeB kernel: [1104432.012469] Hardware name: Dell Inc. PowerEdge R515/0RMRF7, BIOS 2.0.2 10/22/2012 Sep 13 08:10:18 nodeB kernel: [1104432.134659] task: 880a61dca940 ti: 88084a5ac000 task.ti: 88084a5ac000 Sep 13 08:10:18 nodeB kernel: [1104432.265260] RIP: 0010:[] [] _ocfs2_free_suballoc_bits+0x4db/0x4e0 [ocfs2] Sep 13 08:10:18 nodeB kernel: [1104432.406559] RSP: 0018:88084a5af798 EFLAGS: 00010246 Sep 13 08:10:18 nodeB kernel: [1104432.479958] RAX: RBX: 881acebcb000 RCX: 881fcd372e00 Sep 13 08:10:18 nodeB kernel: [1104432.630768] RDX: 881fd0d4dc30 RSI: 88197e351bc8 RDI: 880fd127b2b0 Sep 13 08:10:18 nodeB kernel: [1104432.789688] RBP: 88084a5af818 R08: 0002 R09: 7e00 Sep 13 08:10:18 nodeB kernel: [1104432.950053] R10: 880d39a21020 R11: 88084a5af550 R12: 00fa Sep 13 08:10:18 nodeB kernel: [1104433.113014] R13: 5ab1 R14: R15: 880fb2d43000 Sep 13 08:10:18 nodeB kernel: [1104433.276484] FS: 7fcc68373840() GS:881fdde8() knlGS: Sep 13 08:10:18 nodeB kernel: [1104433.440016] CS: 0010 DS: ES: CR0: 8005003b Sep 13 08:10:18 nodeB kernel: [1104433.521496] CR2: 5647b2ee6d80 CR3: 000198b93000 CR4: 000406e0 Sep 13 08:10:18 nodeB kernel: [1104433.681357] Stack: Sep 13 08:10:18 nodeB kernel: [1104433.758498] 880fd127b2e8 881fc6655f08 5bab Sep 13 08:10:18 nodeB kernel: [1104433.913655] 881fd0c51d80 88197e351bc8 880fd127b330 880e9eaa6000 Sep 13 08:10:18 nodeB kernel: [1104434.068609] 88197e351bc8 817ba6d6 0001 1ac592b1 Sep 13 08:10:18 nodeB kernel: [1104434.223347] Call Trace: Sep 13 08:10:18 nodeB kernel: [1104434.298560] [] ? mutex_lock+0x16/0x37 Sep 13 08:10:18 nodeB kernel: [1104434.374183] [] _ocfs2_free_clusters+0xea/0x200 [ocfs2] Sep 13 08:10:18 nodeB kernel: [1104434.449628] [] ? ocfs2_put_slot+0xe0/0xe0 [ocfs2] Sep 13 08:10:18 nodeB kernel: [1104434.523971] [] ? ocfs2_put_slot+0xe0/0xe0 [ocfs2] Sep 13 08:10:18 nodeB kernel: [1104434.595803] [] ocfs2_free_clusters+0x15/0x20 [ocfs2] Sep 13 08:10:18 nodeB kernel: [1104434.14] [] __ocfs2_flush_truncate_log+0x247/0x560 [ocfs2] Sep 13 08:10:18 nodeB kernel: [1104434.806017] [] ? ocfs2_num_free_extents+0x56/0x120 [ocfs2] Sep 13 08:10:18 nodeB kernel: [1104434.946141] [] ocfs2_remove_btree_range+0x4e8/0x760 [ocfs2] Sep 13 08:10:18 nodeB kernel: [1104435.086490] [] ocfs2_commit_truncate+0x180/0x590 [ocfs2] Sep 13 08:10:18 nodeB kernel: [1104435.158189] [] ? ocfs2_allocate_extend_trans+0x130/0x130 [ocfs2] Sep 13 08:10:18 nodeB kernel: [1104435.297235] [] ocfs2_truncate_file+0x39c/0x610 [ocfs2] Sep 13 08:10:18 nodeB kernel: [1104435.368060] [] ? ocfs2_read_inode_block+0x10/0x20 [ocfs2] Sep 13 08:10:18 nodeB kernel: [1104435.505117] [] ocfs2_setattr+0x4b7/0xa50 [ocfs2] Sep 13 08:10:18 nodeB kernel: [1104435.574617] [] ? ocfs2_xattr_get+0x9d/0x130 [ocfs2] Sep 13 08:10:18 nodeB kernel: [1104435.643722] [] notify_change+0x1ae/0x380 Sep 13 08:10:18 nodeB kernel: [1104435.712037] [] do_truncate+0x66/0xa0 Sep 13 08:10:18 nodeB kernel: [1104435.778685] [] path_openat+0x277/0x1330 Sep 13 08:10:18 nodeB kernel: [1104435.845776] [] ?