Re: [Ocfs2-users] ocf2 mount point hangs

2016-09-14 Thread Ishmael Tsoaela
thanks,

still nothing when using   find / -iname '*suballoc.*'



This code is it:


2398 /* The caller got this descriptor from
2399  * ocfs2_read_group_descriptor().  Any corruption is a code bug. */
2400 BUG_ON(!OCFS2_IS_VALID_GROUP_DESC(bg));
2401
2402 trace_ocfs2_block_group_clear_bits(bit_off, num_bits);
2403
2404 BUG_ON(undo_fn && !ocfs2_is_cluster_bitmap(alloc_inode));
2405 status = ocfs2_journal_access_gd(handle, INODE_CACHE(alloc_inode),
2406  group_bh,
2407  undo_fn ?
2408  OCFS2_JOURNAL_ACCESS_UNDO :
2409  OCFS2_JOURNAL_ACCESS_WRITE);
2410 if (status < 0) {
2411 mlog_errno(status);
2412 goto bail;
2413 }
2414
2415 if (undo_fn) {
2416 jbd_lock_bh_state(group_bh);
2417 undo_bg = (struct ocfs2_group_desc *)
2418 bh2jh(group_bh)->b_committed_data;
2419 BUG_ON(!undo_bg);
2420 }
2421
2422 tmp = num_bits;
2423 while(tmp--) {
2424 ocfs2_clear_bit((bit_off + tmp),
2425 (unsigned long *) bg->bg_bitmap);
2426 if (undo_fn)
2427 undo_fn(bit_off + tmp,
2428 (unsigned long *) undo_bg->bg_bitmap);
2429 }
2430 le16_add_cpu(>bg_free_bits_count, num_bits);
2431 if (le16_to_cpu(bg->bg_free_bits_count) >
le16_to_cpu(bg->bg_bits)) {
2432 ocfs2_error(alloc_inode->i_sb, "Group descriptor
# %llu has bit"
2433 " count %u but claims %u are freed.
num_bits %d",
2434 (unsigned long long)le64_to_cpu(bg->bg_blkno),
2435 le16_to_cpu(bg->bg_bits),
2436 le16_to_cpu(bg->bg_free_bits_count), num_bits);
2437 return -EROFS;
2438 }
2439


On Wed, Sep 14, 2016 at 1:52 PM, Werner Flamme  wrote:
> Ishmael Tsoaela [14.09.2016 13:43]:
>> thanks for the response.  I actually downloaded the source code for
>> kernel version 4.2.0 I am on the same version as:
>>
>> # uname -r
>> 4.2.0-27-generic
>>
>> wget 
>> http://archive.ubuntu.com/ubuntu/pool/main/l/linux/linux_4.2.0.orig.tar.gz
>> tar xvf linux_4.2.0.orig.tar.gz
>>
>> cd  /home/ishmael/linux-4.2/fs/ocfs2
>>
>>
>> I found the suballoc.c in there.
>>
>>
>> I was not able to find the code on the OS itself
>>
>> root@nodeB:/# find / -iname *suballoc.*
>
> Better use # find / -iname '*suballoc.*', so that your shell will not
> make the standard replacements on * :)
>
>>
>> Will the code in the last email suffice?
>
> I wouldn't know until you tell us that the code in your file is the same
> as you posted :)
>
> Werner
>
> --
>
>
>
> ___
> Ocfs2-users mailing list
> Ocfs2-users@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-users

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] ocf2 mount point hangs

2016-09-14 Thread Ishmael Tsoaela
Hi eric,

I found the code below from
archive.ubuntu.com/ubuntu/pool/main/l/linux/fs/ocfs2/suballoc.c




2410 if (status < 0) {
2411 mlog_errno(status);
2412 goto bail;
2413 }
2414
2415 if (undo_fn) {
2416 jbd_lock_bh_state(group_bh);
2417 undo_bg = (struct ocfs2_group_desc *)
2418 bh2jh(group_bh)->b_committed_data;
2419 BUG_ON(!undo_bg);
2420 }
2421
2422 tmp = num_bits;
2423 while(tmp--) {
2424 ocfs2_clear_bit((bit_off + tmp),
2425 (unsigned long *) bg->bg_bitmap);
2426 if (undo_fn)
2427 undo_fn(bit_off + tmp,
2428 (unsigned long *) undo_bg->bg_bitmap);
2429 }
2430 le16_add_cpu(>bg_free_bits_count, num_bits);
2431 if (le16_to_cpu(bg->bg_free_bits_count) >
le16_to_cpu(bg->bg_bits)) {
2432 ocfs2_error(alloc_inode->i_sb, "Group descriptor
# %llu has bit"
2433 " count %u but claims %u are freed.
num_bits %d",
2434 (unsigned long long)le64_to_cpu(bg->bg_blkno),
2435 le16_to_cpu(bg->bg_bits),
2436 le16_to_cpu(bg->bg_free_bits_count), num_bits);
2437 return -EROFS;
2438 }

On Wed, Sep 14, 2016 at 10:13 AM, Eric Ren  wrote:
> Hi,
>
> On 09/14/2016 02:30 PM, Ishmael Tsoaela wrote:
>>
>> Hi Eric,
>>
>> Could you paste the code context around this line?
>> Sep 13 08:10:18 nodeB kernel: [1104431.300882] kernel BUG at
>>
>> /build/linux-lts-wily-Vv6Eyd/linux-lts-wily-4.2.0/fs/ocfs2/suballoc.c:2419!
>
> This message is very import because it shows exactly which line of the
> source code
> directly results in this BUG() output. What I want you do is to paste out
> the code around #2419
> of suballoc.c. Such I can locate where the BUG() is locally because the code
> of line#2419 is different
> with different code version.
>>
>>
>> Apologies but I tried to understand this but failed
>>
>>
>> root@nodeB:~# echo w > /proc/sysrq-trigger
>> root@nodeB:~#
>>
>> Node reboot and mount points are accessble from all 3 nodes, not sure
>> why but it seems it will be difficult to figure out what went wrong
>> with ocfs2 without proper knowledge, so let me not waste any of your
>> time, let me figure out 'crash`[1][2] or gdb'  then hopefully when it
>> happens next time I would have much better understanding
>
> OK, good luck!
>
>
> Eric
>>
>>
>> On Tue, Sep 13, 2016 at 11:44 AM, Eric Ren  wrote:
>>>
>>> On 09/13/2016 05:01 PM, Ishmael Tsoaela wrote:

 Hi Eric,

 Sorry Here are the other 2 syslogs if you need and debug output
>>>
>>> According to the logs,  the nodeB should be the first one that got
>>> problem.
>>>
>>> Could you paste the code context around this line?
>>> Sep 13 08:10:18 nodeB kernel: [1104431.300882] kernel BUG at
>>>
>>> /build/linux-lts-wily-Vv6Eyd/linux-lts-wily-4.2.0/fs/ocfs2/suballoc.c:2419!

 The request in the snip attached just hangs
>>>
>>> NodeB should have taken this exclusive cluster lock, so any commands
>>> trying
>>> to access that file will hang up.
>>>
>>> Could you provide the output of `echo w > /proc/sysrq-trigger`? OCFS2
>>> issue
>>> is not easy to debug if developer cannot reproduce
>>> it locally, and this is the case. BTW, you can narrow down by
>>> `crash`[1][2]
>>> or gdb if you have some knowledge of kernel stuff.
>>>
>>> [1] http://www.dedoimedo.com/computers/crash-analyze.html
>>> [2] https://people.redhat.com/anderson/crash_whitepaper/
>>>
>>> Eric
>>>







 On Tue, Sep 13, 2016 at 10:37 AM, Ishmael Tsoaela 
 wrote:
>
> Thanks for the response
>
>
> 1.  the disk is a shared ceph rbd device
>
>#rbd showmapped
> id poolimage snap device
> 1  vmimagesblock_vmimages-/dev/rbd1
>
>
> 2. ocfs2 has been working well for 2 months now, with a reboot 12 days
> ago
>
> 3.  3 ceph nodes all have rbd image mapped and  ocfs3 mounted
>
> commands used
>
> #sudo rbd map block_vmimages  --pool vmimages --name
>
> #sudo mount /dev/rbd/vmimages/block_vmimages /mnt/vmimages/
> /dev/rbd1
>
> 4.
> root@nodeC:~# sudo debugfs.ocfs2 -R stats /dev/rbd1
>   Revision: 0.90
>   Mount Count: 0   Max Mount Count: 20
>   State: 0   Errors: 0
>   Check Interval: 0   Last Check: Tue Aug  2 15:41:12 2016
>   Creator OS: 0
>   Feature Compat: 3 backup-super strict-journal-super
>   Feature Incompat: 592 sparse inline-data xattr
>   Tunefs Incomplete: 0
>   Feature RO compat: 1 unwritten

Re: [Ocfs2-users] ocf2 mount point hangs

2016-09-14 Thread Eric Ren
Hi,

On 09/14/2016 02:30 PM, Ishmael Tsoaela wrote:
> Hi Eric,
>
> Could you paste the code context around this line?
> Sep 13 08:10:18 nodeB kernel: [1104431.300882] kernel BUG at
> /build/linux-lts-wily-Vv6Eyd/linux-lts-wily-4.2.0/fs/ocfs2/suballoc.c:2419!
This message is very import because it shows exactly which line of the source 
code
directly results in this BUG() output. What I want you do is to paste out the 
code around #2419
of suballoc.c. Such I can locate where the BUG() is locally because the code of 
line#2419 is 
different
with different code version.
>
> Apologies but I tried to understand this but failed
>
>
> root@nodeB:~# echo w > /proc/sysrq-trigger
> root@nodeB:~#
>
> Node reboot and mount points are accessble from all 3 nodes, not sure
> why but it seems it will be difficult to figure out what went wrong
> with ocfs2 without proper knowledge, so let me not waste any of your
> time, let me figure out 'crash`[1][2] or gdb'  then hopefully when it
> happens next time I would have much better understanding
OK, good luck!

Eric
>
> On Tue, Sep 13, 2016 at 11:44 AM, Eric Ren  wrote:
>> On 09/13/2016 05:01 PM, Ishmael Tsoaela wrote:
>>> Hi Eric,
>>>
>>> Sorry Here are the other 2 syslogs if you need and debug output
>> According to the logs,  the nodeB should be the first one that got problem.
>>
>> Could you paste the code context around this line?
>> Sep 13 08:10:18 nodeB kernel: [1104431.300882] kernel BUG at
>> /build/linux-lts-wily-Vv6Eyd/linux-lts-wily-4.2.0/fs/ocfs2/suballoc.c:2419!
>>> The request in the snip attached just hangs
>> NodeB should have taken this exclusive cluster lock, so any commands trying
>> to access that file will hang up.
>>
>> Could you provide the output of `echo w > /proc/sysrq-trigger`? OCFS2 issue
>> is not easy to debug if developer cannot reproduce
>> it locally, and this is the case. BTW, you can narrow down by `crash`[1][2]
>> or gdb if you have some knowledge of kernel stuff.
>>
>> [1] http://www.dedoimedo.com/computers/crash-analyze.html
>> [2] https://people.redhat.com/anderson/crash_whitepaper/
>>
>> Eric
>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Sep 13, 2016 at 10:37 AM, Ishmael Tsoaela 
>>> wrote:
 Thanks for the response


 1.  the disk is a shared ceph rbd device

#rbd showmapped
 id poolimage snap device
 1  vmimagesblock_vmimages-/dev/rbd1


 2. ocfs2 has been working well for 2 months now, with a reboot 12 days
 ago

 3.  3 ceph nodes all have rbd image mapped and  ocfs3 mounted

 commands used

 #sudo rbd map block_vmimages  --pool vmimages --name

 #sudo mount /dev/rbd/vmimages/block_vmimages /mnt/vmimages/
 /dev/rbd1

 4.
 root@nodeC:~# sudo debugfs.ocfs2 -R stats /dev/rbd1
   Revision: 0.90
   Mount Count: 0   Max Mount Count: 20
   State: 0   Errors: 0
   Check Interval: 0   Last Check: Tue Aug  2 15:41:12 2016
   Creator OS: 0
   Feature Compat: 3 backup-super strict-journal-super
   Feature Incompat: 592 sparse inline-data xattr
   Tunefs Incomplete: 0
   Feature RO compat: 1 unwritten
   Root Blknum: 5   System Dir Blknum: 6
   First Cluster Group Blknum: 3
   Block Size Bits: 12   Cluster Size Bits: 12
   Max Node Slots: 16
   Extended Attributes Inline Size: 256
   Label:
   UUID: 238F878003E7455FA5B01CC884D1047F
   Hash: 919897149 (0x36d4843d)
   DX Seed[0]: 0x
   DX Seed[1]: 0x
   DX Seed[2]: 0x
   Cluster stack: classic o2cb
   Inode: 2   Mode: 00   Generation: 1754092981 (0x688d55b5)
   FS Generation: 1754092981 (0x688d55b5)
   CRC32:    ECC: 
   Type: Unknown   Attr: 0x0   Flags: Valid System Superblock
   Dynamic Features: (0x0)
   User: 0 (root)   Group: 0 (root)   Size: 0
   Links: 0   Clusters: 64000
   ctime: 0x57a0a2f8 -- Tue Aug  2 15:41:12 2016
   atime: 0x0 -- Thu Jan  1 02:00:00 1970
   mtime: 0x57a0a2f8 -- Tue Aug  2 15:41:12 2016
   dtime: 0x0 -- Thu Jan  1 02:00:00 1970
   ctime_nsec: 0x -- 0
   atime_nsec: 0x -- 0
   mtime_nsec: 0x -- 0
   Refcount Block: 0
   Last Extblk: 0   Orphan Slot: 0
   Sub Alloc Slot: Global   Sub Alloc Bit: 65535



 thanks for the assistance


 On Tue, Sep 13, 2016 at 10:23 AM, Eric Ren  wrote:
> Hi,
>
> On 09/13/2016 03:16 PM, Ishmael Tsoaela wrote:
>> Hi All,
>>
>> I have an ocfs2  mount point of 3 ceph cluster nodes 

Re: [Ocfs2-users] ocf2 mount point hangs

2016-09-14 Thread Ishmael Tsoaela
Hi Eric,

Could you paste the code context around this line?
   Sep 13 08:10:18 nodeB kernel: [1104431.300882] kernel BUG at
/build/linux-lts-wily-Vv6Eyd/linux-lts-wily-4.2.0/fs/ocfs2/suballoc.c:2419!

Apologies but I tried to understand this but failed


root@nodeB:~# echo w > /proc/sysrq-trigger
root@nodeB:~#

Node reboot and mount points are accessble from all 3 nodes, not sure
why but it seems it will be difficult to figure out what went wrong
with ocfs2 without proper knowledge, so let me not waste any of your
time, let me figure out 'crash`[1][2] or gdb'  then hopefully when it
happens next time I would have much better understanding

On Tue, Sep 13, 2016 at 11:44 AM, Eric Ren  wrote:
> On 09/13/2016 05:01 PM, Ishmael Tsoaela wrote:
>>
>> Hi Eric,
>>
>> Sorry Here are the other 2 syslogs if you need and debug output
>
> According to the logs,  the nodeB should be the first one that got problem.
>
> Could you paste the code context around this line?
>Sep 13 08:10:18 nodeB kernel: [1104431.300882] kernel BUG at
> /build/linux-lts-wily-Vv6Eyd/linux-lts-wily-4.2.0/fs/ocfs2/suballoc.c:2419!
>>
>> The request in the snip attached just hangs
>
> NodeB should have taken this exclusive cluster lock, so any commands trying
> to access that file will hang up.
>
> Could you provide the output of `echo w > /proc/sysrq-trigger`? OCFS2 issue
> is not easy to debug if developer cannot reproduce
> it locally, and this is the case. BTW, you can narrow down by `crash`[1][2]
> or gdb if you have some knowledge of kernel stuff.
>
> [1] http://www.dedoimedo.com/computers/crash-analyze.html
> [2] https://people.redhat.com/anderson/crash_whitepaper/
>
> Eric
>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Sep 13, 2016 at 10:37 AM, Ishmael Tsoaela 
>> wrote:
>>>
>>> Thanks for the response
>>>
>>>
>>> 1.  the disk is a shared ceph rbd device
>>>
>>>   #rbd showmapped
>>> id poolimage snap device
>>> 1  vmimagesblock_vmimages-/dev/rbd1
>>>
>>>
>>> 2. ocfs2 has been working well for 2 months now, with a reboot 12 days
>>> ago
>>>
>>> 3.  3 ceph nodes all have rbd image mapped and  ocfs3 mounted
>>>
>>> commands used
>>>
>>> #sudo rbd map block_vmimages  --pool vmimages --name
>>>
>>> #sudo mount /dev/rbd/vmimages/block_vmimages /mnt/vmimages/
>>> /dev/rbd1
>>>
>>> 4.
>>> root@nodeC:~# sudo debugfs.ocfs2 -R stats /dev/rbd1
>>>  Revision: 0.90
>>>  Mount Count: 0   Max Mount Count: 20
>>>  State: 0   Errors: 0
>>>  Check Interval: 0   Last Check: Tue Aug  2 15:41:12 2016
>>>  Creator OS: 0
>>>  Feature Compat: 3 backup-super strict-journal-super
>>>  Feature Incompat: 592 sparse inline-data xattr
>>>  Tunefs Incomplete: 0
>>>  Feature RO compat: 1 unwritten
>>>  Root Blknum: 5   System Dir Blknum: 6
>>>  First Cluster Group Blknum: 3
>>>  Block Size Bits: 12   Cluster Size Bits: 12
>>>  Max Node Slots: 16
>>>  Extended Attributes Inline Size: 256
>>>  Label:
>>>  UUID: 238F878003E7455FA5B01CC884D1047F
>>>  Hash: 919897149 (0x36d4843d)
>>>  DX Seed[0]: 0x
>>>  DX Seed[1]: 0x
>>>  DX Seed[2]: 0x
>>>  Cluster stack: classic o2cb
>>>  Inode: 2   Mode: 00   Generation: 1754092981 (0x688d55b5)
>>>  FS Generation: 1754092981 (0x688d55b5)
>>>  CRC32:    ECC: 
>>>  Type: Unknown   Attr: 0x0   Flags: Valid System Superblock
>>>  Dynamic Features: (0x0)
>>>  User: 0 (root)   Group: 0 (root)   Size: 0
>>>  Links: 0   Clusters: 64000
>>>  ctime: 0x57a0a2f8 -- Tue Aug  2 15:41:12 2016
>>>  atime: 0x0 -- Thu Jan  1 02:00:00 1970
>>>  mtime: 0x57a0a2f8 -- Tue Aug  2 15:41:12 2016
>>>  dtime: 0x0 -- Thu Jan  1 02:00:00 1970
>>>  ctime_nsec: 0x -- 0
>>>  atime_nsec: 0x -- 0
>>>  mtime_nsec: 0x -- 0
>>>  Refcount Block: 0
>>>  Last Extblk: 0   Orphan Slot: 0
>>>  Sub Alloc Slot: Global   Sub Alloc Bit: 65535
>>>
>>>
>>>
>>> thanks for the assistance
>>>
>>>
>>> On Tue, Sep 13, 2016 at 10:23 AM, Eric Ren  wrote:

 Hi,

 On 09/13/2016 03:16 PM, Ishmael Tsoaela wrote:
>
> Hi All,
>
> I have an ocfs2  mount point of 3 ceph cluster nodes and suddenly I
> cannot read and write to the mount point although the cluster is clean
> and showing no errors.

 1. What is your ocfs2 shared disk? I mean it's a shared disk exported by
 iscsi target, or a ceph rbd device?
 2. Did you check if ocfs2 works well before any read/write? and how?
 3. Could you elaborating more details how the ceph nodes use ocfs2?
 4. Please provide the output of:
 #sudo debugfs.ocfs2 -R stats /dev/sda
>
>
>
> Are the any other logs I can check?

Re: [Ocfs2-users] ocf2 mount point hangs

2016-09-13 Thread Eric Ren
On 09/13/2016 05:01 PM, Ishmael Tsoaela wrote:
> Hi Eric,
>
> Sorry Here are the other 2 syslogs if you need and debug output
According to the logs,  the nodeB should be the first one that got problem.

Could you paste the code context around this line?
Sep 13 08:10:18 nodeB kernel: [1104431.300882] kernel BUG at 
/build/linux-lts-wily-Vv6Eyd/linux-lts-wily-4.2.0/fs/ocfs2/suballoc.c:2419!
> The request in the snip attached just hangs
NodeB should have taken this exclusive cluster lock, so any commands trying to 
access that 
file will hang up.

Could you provide the output of `echo w > /proc/sysrq-trigger`? OCFS2 issue is 
not easy to 
debug if developer cannot reproduce
it locally, and this is the case. BTW, you can narrow down by `crash`[1][2] or 
gdb if you 
have some knowledge of kernel stuff.

[1] http://www.dedoimedo.com/computers/crash-analyze.html
[2] https://people.redhat.com/anderson/crash_whitepaper/

Eric
>
>
>
>
>
>
>
>
> On Tue, Sep 13, 2016 at 10:37 AM, Ishmael Tsoaela  wrote:
>> Thanks for the response
>>
>>
>> 1.  the disk is a shared ceph rbd device
>>
>>   #rbd showmapped
>> id poolimage snap device
>> 1  vmimagesblock_vmimages-/dev/rbd1
>>
>>
>> 2. ocfs2 has been working well for 2 months now, with a reboot 12 days ago
>>
>> 3.  3 ceph nodes all have rbd image mapped and  ocfs3 mounted
>>
>> commands used
>>
>> #sudo rbd map block_vmimages  --pool vmimages --name
>>
>> #sudo mount /dev/rbd/vmimages/block_vmimages /mnt/vmimages/
>> /dev/rbd1
>>
>> 4.
>> root@nodeC:~# sudo debugfs.ocfs2 -R stats /dev/rbd1
>>  Revision: 0.90
>>  Mount Count: 0   Max Mount Count: 20
>>  State: 0   Errors: 0
>>  Check Interval: 0   Last Check: Tue Aug  2 15:41:12 2016
>>  Creator OS: 0
>>  Feature Compat: 3 backup-super strict-journal-super
>>  Feature Incompat: 592 sparse inline-data xattr
>>  Tunefs Incomplete: 0
>>  Feature RO compat: 1 unwritten
>>  Root Blknum: 5   System Dir Blknum: 6
>>  First Cluster Group Blknum: 3
>>  Block Size Bits: 12   Cluster Size Bits: 12
>>  Max Node Slots: 16
>>  Extended Attributes Inline Size: 256
>>  Label:
>>  UUID: 238F878003E7455FA5B01CC884D1047F
>>  Hash: 919897149 (0x36d4843d)
>>  DX Seed[0]: 0x
>>  DX Seed[1]: 0x
>>  DX Seed[2]: 0x
>>  Cluster stack: classic o2cb
>>  Inode: 2   Mode: 00   Generation: 1754092981 (0x688d55b5)
>>  FS Generation: 1754092981 (0x688d55b5)
>>  CRC32:    ECC: 
>>  Type: Unknown   Attr: 0x0   Flags: Valid System Superblock
>>  Dynamic Features: (0x0)
>>  User: 0 (root)   Group: 0 (root)   Size: 0
>>  Links: 0   Clusters: 64000
>>  ctime: 0x57a0a2f8 -- Tue Aug  2 15:41:12 2016
>>  atime: 0x0 -- Thu Jan  1 02:00:00 1970
>>  mtime: 0x57a0a2f8 -- Tue Aug  2 15:41:12 2016
>>  dtime: 0x0 -- Thu Jan  1 02:00:00 1970
>>  ctime_nsec: 0x -- 0
>>  atime_nsec: 0x -- 0
>>  mtime_nsec: 0x -- 0
>>  Refcount Block: 0
>>  Last Extblk: 0   Orphan Slot: 0
>>  Sub Alloc Slot: Global   Sub Alloc Bit: 65535
>>
>>
>>
>> thanks for the assistance
>>
>>
>> On Tue, Sep 13, 2016 at 10:23 AM, Eric Ren  wrote:
>>> Hi,
>>>
>>> On 09/13/2016 03:16 PM, Ishmael Tsoaela wrote:
 Hi All,

 I have an ocfs2  mount point of 3 ceph cluster nodes and suddenly I
 cannot read and write to the mount point although the cluster is clean
 and showing no errors.
>>> 1. What is your ocfs2 shared disk? I mean it's a shared disk exported by
>>> iscsi target, or a ceph rbd device?
>>> 2. Did you check if ocfs2 works well before any read/write? and how?
>>> 3. Could you elaborating more details how the ceph nodes use ocfs2?
>>> 4. Please provide the output of:
>>> #sudo debugfs.ocfs2 -R stats /dev/sda


 Are the any other logs I can check?
>>> All log messages should go to /var/log/messages, could you attach the whole
>>> log file?
>>>
>>> Eric

 There are some log in kern.log about


 kern.log

 Sep 13 08:10:18 nodeB kernel: [1104431.300882] kernel BUG at

 /build/linux-lts-wily-Vv6Eyd/linux-lts-wily-4.2.0/fs/ocfs2/suballoc.c:2419!
 Sep 13 08:10:18 nodeB kernel: [1104431.345504] invalid opcode:  [#1]
 SMP
 Sep 13 08:10:18 nodeB kernel: [1104431.370081] Modules linked in:
 vhost_net vhost macvtap macvlan ocfs2 quota_tree rbd libceph ipmi_si
 mpt3sas mpt2sas raid_class scsi_transport_sas mptctl mptbase
 xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4
 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp
 ebtable_filter ebtables 

Re: [Ocfs2-users] ocf2 mount point hangs

2016-09-13 Thread Ishmael Tsoaela
Hi Eric,

Sorry Here are the other 2 syslogs if you need and debug output




nodeD
root@nodeD:~# sudo debugfs.ocfs2 -R stats /dev/rbd1
Revision: 0.90
Mount Count: 0   Max Mount Count: 20
State: 0   Errors: 0
Check Interval: 0   Last Check: Tue Aug  2 15:41:12 2016
Creator OS: 0
Feature Compat: 3 backup-super strict-journal-super
Feature Incompat: 592 sparse inline-data xattr
Tunefs Incomplete: 0
Feature RO compat: 1 unwritten
Root Blknum: 5   System Dir Blknum: 6
First Cluster Group Blknum: 3
Block Size Bits: 12   Cluster Size Bits: 12
Max Node Slots: 16
Extended Attributes Inline Size: 256
Label:
UUID: 238F878003E7455FA5B01CC884D1047F
Hash: 919897149 (0x36d4843d)
DX Seed[0]: 0x
DX Seed[1]: 0x
DX Seed[2]: 0x
Cluster stack: classic o2cb
Inode: 2   Mode: 00   Generation: 1754092981 (0x688d55b5)
FS Generation: 1754092981 (0x688d55b5)
CRC32:    ECC: 
Type: Unknown   Attr: 0x0   Flags: Valid System Superblock
Dynamic Features: (0x0)
User: 0 (root)   Group: 0 (root)   Size: 0
Links: 0   Clusters: 64000
ctime: 0x57a0a2f8 -- Tue Aug  2 15:41:12 2016
atime: 0x0 -- Thu Jan  1 02:00:00 1970
mtime: 0x57a0a2f8 -- Tue Aug  2 15:41:12 2016
dtime: 0x0 -- Thu Jan  1 02:00:00 1970
ctime_nsec: 0x -- 0
atime_nsec: 0x -- 0
mtime_nsec: 0x -- 0
Refcount Block: 0
Last Extblk: 0   Orphan Slot: 0
Sub Alloc Slot: Global   Sub Alloc Bit: 65535


nodeB

root@nodeB:~# sudo debugfs.ocfs2 -R stats /dev/rbd1
Revision: 0.90
Mount Count: 0   Max Mount Count: 20
State: 0   Errors: 0
Check Interval: 0   Last Check: Tue Aug  2 15:41:12 2016
Creator OS: 0
Feature Compat: 3 backup-super strict-journal-super
Feature Incompat: 592 sparse inline-data xattr
Tunefs Incomplete: 0
Feature RO compat: 1 unwritten
Root Blknum: 5   System Dir Blknum: 6
First Cluster Group Blknum: 3
Block Size Bits: 12   Cluster Size Bits: 12
Max Node Slots: 16
Extended Attributes Inline Size: 256
Label:
UUID: 238F878003E7455FA5B01CC884D1047F
Hash: 919897149 (0x36d4843d)
DX Seed[0]: 0x
DX Seed[1]: 0x
DX Seed[2]: 0x
Cluster stack: classic o2cb
Inode: 2   Mode: 00   Generation: 1754092981 (0x688d55b5)
FS Generation: 1754092981 (0x688d55b5)
CRC32:    ECC: 
Type: Unknown   Attr: 0x0   Flags: Valid System Superblock
Dynamic Features: (0x0)
User: 0 (root)   Group: 0 (root)   Size: 0
Links: 0   Clusters: 64000
ctime: 0x57a0a2f8 -- Tue Aug  2 15:41:12 2016
atime: 0x0 -- Thu Jan  1 02:00:00 1970
mtime: 0x57a0a2f8 -- Tue Aug  2 15:41:12 2016
dtime: 0x0 -- Thu Jan  1 02:00:00 1970
ctime_nsec: 0x -- 0
atime_nsec: 0x -- 0
mtime_nsec: 0x -- 0
Refcount Block: 0
Last Extblk: 0   Orphan Slot: 0
Sub Alloc Slot: Global   Sub Alloc Bit: 65535




The request in the snip attached just hangs








On Tue, Sep 13, 2016 at 10:37 AM, Ishmael Tsoaela  wrote:
> Thanks for the response
>
>
> 1.  the disk is a shared ceph rbd device
>
>  #rbd showmapped
> id poolimage snap device
> 1  vmimagesblock_vmimages-/dev/rbd1
>
>
> 2. ocfs2 has been working well for 2 months now, with a reboot 12 days ago
>
> 3.  3 ceph nodes all have rbd image mapped and  ocfs3 mounted
>
> commands used
>
> #sudo rbd map block_vmimages  --pool vmimages --name
>
> #sudo mount /dev/rbd/vmimages/block_vmimages /mnt/vmimages/
> /dev/rbd1
>
> 4.
> root@nodeC:~# sudo debugfs.ocfs2 -R stats /dev/rbd1
> Revision: 0.90
> Mount Count: 0   Max Mount Count: 20
> State: 0   Errors: 0
> Check Interval: 0   Last Check: Tue Aug  2 15:41:12 2016
> Creator OS: 0
> Feature Compat: 3 backup-super strict-journal-super
> Feature Incompat: 592 sparse inline-data xattr
> Tunefs Incomplete: 0
> Feature RO compat: 1 unwritten
> Root Blknum: 5   System Dir Blknum: 6
> First Cluster Group Blknum: 3
> Block Size Bits: 12   Cluster Size Bits: 12
> Max Node Slots: 16
> Extended Attributes Inline Size: 256
> Label:
> UUID: 238F878003E7455FA5B01CC884D1047F
> Hash: 919897149 (0x36d4843d)
> DX Seed[0]: 0x
> DX Seed[1]: 0x
> DX Seed[2]: 0x
> Cluster stack: classic o2cb
> Inode: 2   Mode: 00   Generation: 1754092981 (0x688d55b5)
> FS Generation: 1754092981 

Re: [Ocfs2-users] ocf2 mount point hangs

2016-09-13 Thread Eric Ren
Hi,

On 09/13/2016 03:16 PM, Ishmael Tsoaela wrote:
> Hi All,
>
> I have an ocfs2  mount point of 3 ceph cluster nodes and suddenly I
> cannot read and write to the mount point although the cluster is clean
> and showing no errors.
1. What is your ocfs2 shared disk? I mean it's a shared disk exported by iscsi 
target, or a 
ceph rbd device?
2. Did you check if ocfs2 works well before any read/write? and how?
3. Could you elaborating more details how the ceph nodes use ocfs2?
4. Please provide the output of:
#sudo debugfs.ocfs2 -R stats /dev/sda
>
>
> Are the any other logs I can check?
All log messages should go to /var/log/messages, could you attach the whole log 
file?

Eric
>
> There are some log in kern.log about
>
>
> kern.log
>
> Sep 13 08:10:18 nodeB kernel: [1104431.300882] kernel BUG at
> /build/linux-lts-wily-Vv6Eyd/linux-lts-wily-4.2.0/fs/ocfs2/suballoc.c:2419!
> Sep 13 08:10:18 nodeB kernel: [1104431.345504] invalid opcode:  [#1] SMP
> Sep 13 08:10:18 nodeB kernel: [1104431.370081] Modules linked in:
> vhost_net vhost macvtap macvlan ocfs2 quota_tree rbd libceph ipmi_si
> mpt3sas mpt2sas raid_class scsi_transport_sas mptctl mptbase
> xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4
> iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
> xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp
> ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter
> ip_tables x_tables dell_rbu ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm
> ocfs2_nodemanager ocfs2_stackglue configfs bridge stp llc binfmt_misc
> ipmi_devintf kvm_amd dcdbas kvm input_leds joydev amd64_edac_mod
> crct10dif_pclmul edac_core shpchp i2c_piix4 fam15h_power crc32_pclmul
> edac_mce_amd ipmi_ssif k10temp aesni_intel aes_x86_64 lrw gf128mul
> 8250_fintek glue_helper acpi_power_meter mac_hid serio_raw ablk_helper
> cryptd ipmi_msghandler xfs libcrc32c lp parport ixgbe dca hid_generic
> uas usbhid vxlan usb_storage ip6_udp_tunnel hid udp_tunnel ptp psmouse
> bnx2 pps_core megaraid_sas mdio [last unloaded: ipmi_si]
> Sep 13 08:10:18 nodeB kernel: [1104431.898986] CPU: 10 PID: 65016
> Comm: cp Not tainted 4.2.0-27-generic #32~14.04.1-Ubuntu
> Sep 13 08:10:18 nodeB kernel: [1104432.012469] Hardware name: Dell
> Inc. PowerEdge R515/0RMRF7, BIOS 2.0.2 10/22/2012
> Sep 13 08:10:18 nodeB kernel: [1104432.134659] task: 880a61dca940
> ti: 88084a5ac000 task.ti: 88084a5ac000
> Sep 13 08:10:18 nodeB kernel: [1104432.265260] RIP:
> 0010:[]  []
> _ocfs2_free_suballoc_bits+0x4db/0x4e0 [ocfs2]
> Sep 13 08:10:18 nodeB kernel: [1104432.406559] RSP:
> 0018:88084a5af798  EFLAGS: 00010246
> Sep 13 08:10:18 nodeB kernel: [1104432.479958] RAX: 
> RBX: 881acebcb000 RCX: 881fcd372e00
> Sep 13 08:10:18 nodeB kernel: [1104432.630768] RDX: 881fd0d4dc30
> RSI: 88197e351bc8 RDI: 880fd127b2b0
> Sep 13 08:10:18 nodeB kernel: [1104432.789688] RBP: 88084a5af818
> R08: 0002 R09: 7e00
> Sep 13 08:10:18 nodeB kernel: [1104432.950053] R10: 880d39a21020
> R11: 88084a5af550 R12: 00fa
> Sep 13 08:10:18 nodeB kernel: [1104433.113014] R13: 5ab1
> R14:  R15: 880fb2d43000
> Sep 13 08:10:18 nodeB kernel: [1104433.276484] FS:
> 7fcc68373840() GS:881fdde8()
> knlGS:
> Sep 13 08:10:18 nodeB kernel: [1104433.440016] CS:  0010 DS:  ES:
>  CR0: 8005003b
> Sep 13 08:10:18 nodeB kernel: [1104433.521496] CR2: 5647b2ee6d80
> CR3: 000198b93000 CR4: 000406e0
> Sep 13 08:10:18 nodeB kernel: [1104433.681357] Stack:
> Sep 13 08:10:18 nodeB kernel: [1104433.758498]  
> 880fd127b2e8 881fc6655f08 5bab
> Sep 13 08:10:18 nodeB kernel: [1104433.913655]  881fd0c51d80
> 88197e351bc8 880fd127b330 880e9eaa6000
> Sep 13 08:10:18 nodeB kernel: [1104434.068609]  88197e351bc8
> 817ba6d6 0001 1ac592b1
> Sep 13 08:10:18 nodeB kernel: [1104434.223347] Call Trace:
> Sep 13 08:10:18 nodeB kernel: [1104434.298560]  [] ?
> mutex_lock+0x16/0x37
> Sep 13 08:10:18 nodeB kernel: [1104434.374183]  []
> _ocfs2_free_clusters+0xea/0x200 [ocfs2]
> Sep 13 08:10:18 nodeB kernel: [1104434.449628]  [] ?
> ocfs2_put_slot+0xe0/0xe0 [ocfs2]
> Sep 13 08:10:18 nodeB kernel: [1104434.523971]  [] ?
> ocfs2_put_slot+0xe0/0xe0 [ocfs2]
> Sep 13 08:10:18 nodeB kernel: [1104434.595803]  []
> ocfs2_free_clusters+0x15/0x20 [ocfs2]
> Sep 13 08:10:18 nodeB kernel: [1104434.14]  []
> __ocfs2_flush_truncate_log+0x247/0x560 [ocfs2]
> Sep 13 08:10:18 nodeB kernel: [1104434.806017]  [] ?
> ocfs2_num_free_extents+0x56/0x120 [ocfs2]
> Sep 13 08:10:18 nodeB kernel: [1104434.946141]  []
> ocfs2_remove_btree_range+0x4e8/0x760 [ocfs2]
> Sep 13 08:10:18 nodeB kernel: [1104435.086490]  []
> ocfs2_commit_truncate+0x180/0x590 [ocfs2]
> Sep 13 08:10:18 nodeB kernel: [1104435.158189]  [] ?
> ocfs2_allocate_extend_trans+0x130/0x130 

[Ocfs2-users] ocf2 mount point hangs

2016-09-13 Thread Ishmael Tsoaela
Hi All,

I have an ocfs2  mount point of 3 ceph cluster nodes and suddenly I
cannot read and write to the mount point although the cluster is clean
and showing no errors.


Are the any other logs I can check?

There are some log in kern.log about


kern.log

Sep 13 08:10:18 nodeB kernel: [1104431.300882] kernel BUG at
/build/linux-lts-wily-Vv6Eyd/linux-lts-wily-4.2.0/fs/ocfs2/suballoc.c:2419!
Sep 13 08:10:18 nodeB kernel: [1104431.345504] invalid opcode:  [#1] SMP
Sep 13 08:10:18 nodeB kernel: [1104431.370081] Modules linked in:
vhost_net vhost macvtap macvlan ocfs2 quota_tree rbd libceph ipmi_si
mpt3sas mpt2sas raid_class scsi_transport_sas mptctl mptbase
xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4
iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp
ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter
ip_tables x_tables dell_rbu ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm
ocfs2_nodemanager ocfs2_stackglue configfs bridge stp llc binfmt_misc
ipmi_devintf kvm_amd dcdbas kvm input_leds joydev amd64_edac_mod
crct10dif_pclmul edac_core shpchp i2c_piix4 fam15h_power crc32_pclmul
edac_mce_amd ipmi_ssif k10temp aesni_intel aes_x86_64 lrw gf128mul
8250_fintek glue_helper acpi_power_meter mac_hid serio_raw ablk_helper
cryptd ipmi_msghandler xfs libcrc32c lp parport ixgbe dca hid_generic
uas usbhid vxlan usb_storage ip6_udp_tunnel hid udp_tunnel ptp psmouse
bnx2 pps_core megaraid_sas mdio [last unloaded: ipmi_si]
Sep 13 08:10:18 nodeB kernel: [1104431.898986] CPU: 10 PID: 65016
Comm: cp Not tainted 4.2.0-27-generic #32~14.04.1-Ubuntu
Sep 13 08:10:18 nodeB kernel: [1104432.012469] Hardware name: Dell
Inc. PowerEdge R515/0RMRF7, BIOS 2.0.2 10/22/2012
Sep 13 08:10:18 nodeB kernel: [1104432.134659] task: 880a61dca940
ti: 88084a5ac000 task.ti: 88084a5ac000
Sep 13 08:10:18 nodeB kernel: [1104432.265260] RIP:
0010:[]  []
_ocfs2_free_suballoc_bits+0x4db/0x4e0 [ocfs2]
Sep 13 08:10:18 nodeB kernel: [1104432.406559] RSP:
0018:88084a5af798  EFLAGS: 00010246
Sep 13 08:10:18 nodeB kernel: [1104432.479958] RAX: 
RBX: 881acebcb000 RCX: 881fcd372e00
Sep 13 08:10:18 nodeB kernel: [1104432.630768] RDX: 881fd0d4dc30
RSI: 88197e351bc8 RDI: 880fd127b2b0
Sep 13 08:10:18 nodeB kernel: [1104432.789688] RBP: 88084a5af818
R08: 0002 R09: 7e00
Sep 13 08:10:18 nodeB kernel: [1104432.950053] R10: 880d39a21020
R11: 88084a5af550 R12: 00fa
Sep 13 08:10:18 nodeB kernel: [1104433.113014] R13: 5ab1
R14:  R15: 880fb2d43000
Sep 13 08:10:18 nodeB kernel: [1104433.276484] FS:
7fcc68373840() GS:881fdde8()
knlGS:
Sep 13 08:10:18 nodeB kernel: [1104433.440016] CS:  0010 DS:  ES:
 CR0: 8005003b
Sep 13 08:10:18 nodeB kernel: [1104433.521496] CR2: 5647b2ee6d80
CR3: 000198b93000 CR4: 000406e0
Sep 13 08:10:18 nodeB kernel: [1104433.681357] Stack:
Sep 13 08:10:18 nodeB kernel: [1104433.758498]  
880fd127b2e8 881fc6655f08 5bab
Sep 13 08:10:18 nodeB kernel: [1104433.913655]  881fd0c51d80
88197e351bc8 880fd127b330 880e9eaa6000
Sep 13 08:10:18 nodeB kernel: [1104434.068609]  88197e351bc8
817ba6d6 0001 1ac592b1
Sep 13 08:10:18 nodeB kernel: [1104434.223347] Call Trace:
Sep 13 08:10:18 nodeB kernel: [1104434.298560]  [] ?
mutex_lock+0x16/0x37
Sep 13 08:10:18 nodeB kernel: [1104434.374183]  []
_ocfs2_free_clusters+0xea/0x200 [ocfs2]
Sep 13 08:10:18 nodeB kernel: [1104434.449628]  [] ?
ocfs2_put_slot+0xe0/0xe0 [ocfs2]
Sep 13 08:10:18 nodeB kernel: [1104434.523971]  [] ?
ocfs2_put_slot+0xe0/0xe0 [ocfs2]
Sep 13 08:10:18 nodeB kernel: [1104434.595803]  []
ocfs2_free_clusters+0x15/0x20 [ocfs2]
Sep 13 08:10:18 nodeB kernel: [1104434.14]  []
__ocfs2_flush_truncate_log+0x247/0x560 [ocfs2]
Sep 13 08:10:18 nodeB kernel: [1104434.806017]  [] ?
ocfs2_num_free_extents+0x56/0x120 [ocfs2]
Sep 13 08:10:18 nodeB kernel: [1104434.946141]  []
ocfs2_remove_btree_range+0x4e8/0x760 [ocfs2]
Sep 13 08:10:18 nodeB kernel: [1104435.086490]  []
ocfs2_commit_truncate+0x180/0x590 [ocfs2]
Sep 13 08:10:18 nodeB kernel: [1104435.158189]  [] ?
ocfs2_allocate_extend_trans+0x130/0x130 [ocfs2]
Sep 13 08:10:18 nodeB kernel: [1104435.297235]  []
ocfs2_truncate_file+0x39c/0x610 [ocfs2]
Sep 13 08:10:18 nodeB kernel: [1104435.368060]  [] ?
ocfs2_read_inode_block+0x10/0x20 [ocfs2]
Sep 13 08:10:18 nodeB kernel: [1104435.505117]  []
ocfs2_setattr+0x4b7/0xa50 [ocfs2]
Sep 13 08:10:18 nodeB kernel: [1104435.574617]  [] ?
ocfs2_xattr_get+0x9d/0x130 [ocfs2]
Sep 13 08:10:18 nodeB kernel: [1104435.643722]  []
notify_change+0x1ae/0x380
Sep 13 08:10:18 nodeB kernel: [1104435.712037]  []
do_truncate+0x66/0xa0
Sep 13 08:10:18 nodeB kernel: [1104435.778685]  []
path_openat+0x277/0x1330
Sep 13 08:10:18 nodeB kernel: [1104435.845776]  [] ?