Re: [Ocfs2-users] Huge Problem ocfs2

2012-11-09 Thread Sunil Mushran
IO error on channel means the system cannot talk to the block device. The
problem
is in the block layer. Maybe a loose cable or a setup problem.
dmesg should show errors.


On Fri, Nov 9, 2012 at 10:46 AM, Laurentiu Gosu l...@easic.ro wrote:

  Hi,
 I'm using ocfs2 cluster in a production environment since almost 1 year.
 During this time i had to run a fsck.ocfs2 few months ago due to some
 errors but they were fixed.
 Now i have a big problem: I'm not able to mount the volume on any of the
 nodes. I stopped all nodes except one. Some output bellow:
 *mount /mnt/ocfs2**
 **mount.ocfs2: I/O error on channel while trying to determine heartbeat
 information**
 **
 **fsck.ocfs2 /dev/mapper/volgr1-lvol0**
 **fsck.ocfs2 1.6.3**
 **fsck.ocfs2: I/O error on channel while initializing the DLM**
 **
 **fsck.ocfs2 -n /dev/mapper/volgr1-lvol0**
 **fsck.ocfs2 1.6.3**
 **Checking OCFS2 filesystem in /dev/mapper/volgr1-lvol0:**
 **  Label:  SAN**
 **  UUID:   B4CF8D4667AF43118F3324567B90A987**
 **  Number of blocks:   2901788672**
 **  Block size: 4096**
 **  Number of clusters: 45340448**
 **  Cluster size:   262144**
 **  Number of slots:10**
 **
 **journal recovery: I/O error on channel while looking up the journal
 inode for slot 0**
 **fsck encountered unrecoverable errors while replaying the journals and
 will not continue*


 Can you give me some hints on how to debug the problem?

 Thank you,
 Laurentiu.

 ___
 Ocfs2-users mailing list
 Ocfs2-users@oss.oracle.com
 https://oss.oracle.com/mailman/listinfo/ocfs2-users

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] Huge Problem ocfs2

2012-11-09 Thread Marian Serban

Hi Sunil,

Thank you for answering. Unfortunately, it doesn't seem like it's a 
hardware problem. There's no way a cable can be loose because it's iSCSI 
over 1G Ethernet (copper wires) environment. Also I performed dd 
if=/dev/ of=/dev/null and first 16GB or so are fine. Dmesg shows 
no errors.



Also tried with debugfs.ocfs2:


[root@ro02xsrv003 ~]# debugfs.ocfs2  /dev/mapper/volgr1-lvol0
debugfs.ocfs2 1.6.3
debugfs: ls
ls: Bad magic number in inode '.'
debugfs: slotmap
slotmap: Bad magic number in inode while reading slotmap system file
debugfs: stats
Revision: 0.90
Mount Count: 0   Max Mount Count: 20
State: 0   Errors: 0
Check Interval: 0   Last Check: Fri Nov  9 14:35:53 2012
Creator OS: 0
Feature Compat: 3 backup-super strict-journal-super
Feature Incompat: 16208 sparse extended-slotmap inline-data 
metaecc xattr indexed-dirs refcount discontig-bg

Tunefs Incomplete: 0
Feature RO compat: 7 unwritten usrquota grpquota
Root Blknum: 129   System Dir Blknum: 130
First Cluster Group Blknum: 64
Block Size Bits: 12   Cluster Size Bits: 18
Max Node Slots: 10
Extended Attributes Inline Size: 256
Label: SAN
UUID: B4CF8D4667AF43118F3324567B90A987
Hash: 3698209293 (0xdc6e320d)
DX Seed[0]: 0x9f4a2bb7
DX Seed[1]: 0x501ddac0
DX Seed[2]: 0x6034bfe8
Cluster stack: classic o2cb
Inode: 2   Mode: 00   Generation: 1093568923 (0x412e899b)
FS Generation: 1093568923 (0x412e899b)
CRC32: 46f2d360   ECC: 04d4
Type: Unknown   Attr: 0x0   Flags: Valid System Superblock
Dynamic Features: (0x0)
User: 0 (root)   Group: 0 (root)   Size: 0
Links: 0   Clusters: 45340448
ctime: 0x4ee67f67 -- Tue Dec 13 00:25:43 2011
atime: 0x0 -- Thu Jan  1 02:00:00 1970
mtime: 0x4ee67f67 -- Tue Dec 13 00:25:43 2011
dtime: 0x0 -- Thu Jan  1 02:00:00 1970
ctime_nsec: 0x -- 0
atime_nsec: 0x -- 0
mtime_nsec: 0x -- 0
Refcount Block: 0
Last Extblk: 0   Orphan Slot: 0
Sub Alloc Slot: Global   Sub Alloc Bit: 65535




Marian



smime.p7s
Description: S/MIME Cryptographic Signature
___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] Huge Problem ocfs2

2012-11-09 Thread Marian Serban
I tried hacking the fsck.ocfs2 source code by not considering metaecc 
flag. Then I ran into


journal recovery: Bad magic number in inode while looking up the journal 
inode for slot 0
fsck encountered unrecoverable errors while replaying the journals and 
will not continue


After bypassing journal replay function, I got

Pass 0a: Checking cluster allocation chains
pass0: Bad magic number in inode while looking up the global bitmap inode
fsck.ocfs2: Bad magic number in inode while performing pass 0


Does it mean the filesystem is destroyed completely?



On 10.11.2012 02:54, Marian Serban wrote:

That's the kernel:

Linux ro02xsrv003.bv.easic.ro 2.6.39.4 #6 SMP Mon Dec 12 12:09:49 EET 
2011 x86_64 x86_64 x86_64 GNU/Linux


Anyway, I tried disabling the metaecc feature, no luck.

[root@ro02xsrv003 ~]# tunefs.ocfs2 --fs-features=nometaecc 
/dev/mapper/volgr1-lvol0
tunefs.ocfs2: I/O error on channel while opening device 
/dev/mapper/volgr1-lvol0


These are the last lines of strace corresponding to the tunefs.ocfs 
command:




open(/sys/fs/ocfs2/cluster_stack, O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 
0) = 0x7f54aad05000

read(4, o2cb\n, 4096) = 5
close(4)= 0
munmap(0x7f54aad05000, 4096)= 0
open(/sys/fs/o2cb/interface_revision, O_RDONLY) = 4
read(4, 5\n, 15)  = 2
read(4, , 13) = 0
close(4)= 0
stat(/sys/kernel/config, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
statfs(/sys/kernel/config, {f_type=0x62656570, f_bsize=4096, 
f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 
0}, f_namelen=255, f_frsize=4096}) = 0

open(/dev/mapper/volgr1-lvol0, O_RDONLY) = 4
ioctl(4, BLKSSZGET, 0x7fffce711454) = 0
close(4)= 0
pread(3, 
\0\0\v\25\37\1\200\200\202@\21\2\30\26\0\0\0,\17\272\241\4\340\210\311\377\17\300\327\332\373\17..., 
4096, 532480) = 4096

close(3)= 0
write(2, tunefs.ocfs2, 12tunefs.ocfs2)= 12
write(2, : , 2: )   = 2
write(2, I/O error on channel, 20I/O error on channel)= 20
write(2,  , 1 )= 1
write(2, while opening device \/dev/mappe..., 47while opening 
device /dev/mapper/volgr1-lvol0) = 47

write(2, \r\n, 2





On 10.11.2012 02:06, Sunil Mushran wrote:
It's either that or a check sum problem. Disable metaecc. Not sure 
which kernel you are running.
We had fixed few problems few years ago around this. If your kernel 
is older, then it could be

a known issue.


On Fri, Nov 9, 2012 at 12:50 PM, Marian Serban mar...@easic.ro 
mailto:mar...@easic.ro wrote:


Hi Sunil,

Thank you for answering. Unfortunately, it doesn't seem like it's
a hardware problem. There's no way a cable can be loose because
it's iSCSI over 1G Ethernet (copper wires) environment. Also I
performed dd if=/dev/ of=/dev/null and first 16GB or so are
fine. Dmesg shows no errors.


Also tried with debugfs.ocfs2:


[root@ro02xsrv003 ~]# debugfs.ocfs2  /dev/mapper/volgr1-lvol0
debugfs.ocfs2 1.6.3
debugfs: ls
ls: Bad magic number in inode '.'
debugfs: slotmap
slotmap: Bad magic number in inode while reading slotmap system file
debugfs: stats
Revision: 0.90
Mount Count: 0   Max Mount Count: 20
State: 0   Errors: 0
Check Interval: 0   Last Check: Fri Nov  9 14:35:53 2012
Creator OS: 0
Feature Compat: 3 backup-super strict-journal-super
Feature Incompat: 16208 sparse extended-slotmap
inline-data metaecc xattr indexed-dirs refcount discontig-bg
Tunefs Incomplete: 0
Feature RO compat: 7 unwritten usrquota grpquota
Root Blknum: 129   System Dir Blknum: 130
First Cluster Group Blknum: 64
Block Size Bits: 12   Cluster Size Bits: 18
Max Node Slots: 10
Extended Attributes Inline Size: 256
Label: SAN
UUID: B4CF8D4667AF43118F3324567B90A987
Hash: 3698209293 (0xdc6e320d)
DX Seed[0]: 0x9f4a2bb7
DX Seed[1]: 0x501ddac0
DX Seed[2]: 0x6034bfe8
Cluster stack: classic o2cb
Inode: 2   Mode: 00   Generation: 1093568923 (0x412e899b)
FS Generation: 1093568923 (0x412e899b)
CRC32: 46f2d360   ECC: 04d4
Type: Unknown   Attr: 0x0   Flags: Valid System Superblock
Dynamic Features: (0x0)
User: 0 (root)   Group: 0 (root)   Size: 0
Links: 0   Clusters: 45340448
ctime: 0x4ee67f67 -- Tue Dec 13 00:25:43 2011
atime: 0x0 -- Thu Jan  1 02:00:00 1970
mtime: 0x4ee67f67 -- Tue Dec 13 00:25:43 2011
dtime: 0x0 -- Thu Jan  1 02:00:00 1970
ctime_nsec: 0x -- 

Re: [Ocfs2-users] Huge Problem ocfs2

2012-11-09 Thread Sunil Mushran
If global bitmap is gone. then the fs is unusable. But you can extract data
using
the rdump command in debugfs.ocfs. The success depends on how much of the
device is still usable.


On Fri, Nov 9, 2012 at 5:50 PM, Marian Serban mar...@easic.ro wrote:

  I tried hacking the fsck.ocfs2 source code by not considering metaecc
 flag. Then I ran into

 journal recovery: Bad magic number in inode while looking up the journal
 inode for slot 0

 fsck encountered unrecoverable errors while replaying the journals and
 will not continue

 After bypassing journal replay function, I got

 Pass 0a: Checking cluster allocation chains
 pass0: Bad magic number in inode while looking up the global bitmap inode
 fsck.ocfs2: Bad magic number in inode while performing pass 0


 Does it mean the filesystem is destroyed completely?




 On 10.11.2012 02:54, Marian Serban wrote:

 That's the kernel:

 Linux ro02xsrv003.bv.easic.ro 2.6.39.4 #6 SMP Mon Dec 12 12:09:49 EET
 2011 x86_64 x86_64 x86_64 GNU/Linux

 Anyway, I tried disabling the metaecc feature, no luck.

 [root@ro02xsrv003 ~]# tunefs.ocfs2 --fs-features=nometaecc
 /dev/mapper/volgr1-lvol0
 tunefs.ocfs2: I/O error on channel while opening device
 /dev/mapper/volgr1-lvol0

 These are the last lines of strace corresponding to the tunefs.ocfs
 command:



 open(/sys/fs/ocfs2/cluster_stack, O_RDONLY) = 4
 fstat(4, {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0
 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
 0x7f54aad05000
 read(4, o2cb\n, 4096) = 5
 close(4)= 0
 munmap(0x7f54aad05000, 4096)= 0
 open(/sys/fs/o2cb/interface_revision, O_RDONLY) = 4
 read(4, 5\n, 15)  = 2
 read(4, , 13) = 0
 close(4)= 0
 stat(/sys/kernel/config, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
 statfs(/sys/kernel/config, {f_type=0x62656570, f_bsize=4096, f_blocks=0,
 f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255,
 f_frsize=4096}) = 0
 open(/dev/mapper/volgr1-lvol0, O_RDONLY) = 4
 ioctl(4, BLKSSZGET, 0x7fffce711454) = 0
 close(4)= 0
 pread(3, 
 \0\0\v\25\37\1\200\200\202@\21\2\30\26\0\0\0,\17\272\241\4\340\210\311\377\17\300\327\332\373\17...,
 4096, 532480) = 4096
 close(3)= 0
 write(2, tunefs.ocfs2, 12tunefs.ocfs2)= 12
 write(2, : , 2: )   = 2
 write(2, I/O error on channel, 20I/O error on channel)= 20
 write(2,  , 1 )= 1
 write(2, while opening device \/dev/mappe..., 47while opening device
 /dev/mapper/volgr1-lvol0) = 47
 write(2, \r\n, 2





 On 10.11.2012 02:06, Sunil Mushran wrote:

 It's either that or a check sum problem. Disable metaecc. Not sure which
 kernel you are running.
 We had fixed few problems few years ago around this. If your kernel is
 older, then it could be
 a known issue.


 On Fri, Nov 9, 2012 at 12:50 PM, Marian Serban mar...@easic.ro wrote:

 Hi Sunil,

 Thank you for answering. Unfortunately, it doesn't seem like it's a
 hardware problem. There's no way a cable can be loose because it's iSCSI
 over 1G Ethernet (copper wires) environment. Also I performed dd
 if=/dev/ of=/dev/null and first 16GB or so are fine. Dmesg shows no
 errors.


 Also tried with debugfs.ocfs2:


 [root@ro02xsrv003 ~]# debugfs.ocfs2  /dev/mapper/volgr1-lvol0
 debugfs.ocfs2 1.6.3
 debugfs: ls
 ls: Bad magic number in inode '.'
 debugfs: slotmap
 slotmap: Bad magic number in inode while reading slotmap system file
 debugfs: stats
 Revision: 0.90
 Mount Count: 0   Max Mount Count: 20
 State: 0   Errors: 0
 Check Interval: 0   Last Check: Fri Nov  9 14:35:53 2012
 Creator OS: 0
 Feature Compat: 3 backup-super strict-journal-super
 Feature Incompat: 16208 sparse extended-slotmap inline-data
 metaecc xattr indexed-dirs refcount discontig-bg
 Tunefs Incomplete: 0
 Feature RO compat: 7 unwritten usrquota grpquota
 Root Blknum: 129   System Dir Blknum: 130
 First Cluster Group Blknum: 64
 Block Size Bits: 12   Cluster Size Bits: 18
 Max Node Slots: 10
 Extended Attributes Inline Size: 256
 Label: SAN
 UUID: B4CF8D4667AF43118F3324567B90A987
 Hash: 3698209293 (0xdc6e320d)
 DX Seed[0]: 0x9f4a2bb7
 DX Seed[1]: 0x501ddac0
 DX Seed[2]: 0x6034bfe8
 Cluster stack: classic o2cb
 Inode: 2   Mode: 00   Generation: 1093568923 (0x412e899b)
 FS Generation: 1093568923 (0x412e899b)
 CRC32: 46f2d360   ECC: 04d4
 Type: Unknown   Attr: 0x0   Flags: Valid System Superblock
 Dynamic Features: (0x0)
 User: 0 (root)   Group: 0 (root)   Size: 0
 Links: 0   Clusters: 45340448
 ctime: 0x4ee67f67 -- Tue Dec 13 00:25:43 2011
 atime: 0x0 -- Thu Jan  1 02:00:00 1970
 mtime: 

Re: [Ocfs2-users] Huge Problem ocfs2

2012-11-09 Thread Marian Serban

Nope, rdump doesn't work either.

debugfs: rdump -v / /tmp
Copying to /tmp/
rdump: Bad magic number in inode while reading inode 129
rdump: Bad magic number in inode while recursively dumping inode 129


Could you please confirm that it's enough to just force the return value 
of 0 at ocfs2_validate_meta_ecc in order to bypass the ECC checks?




On 10.11.2012 03:55, Sunil Mushran wrote:
If global bitmap is gone. then the fs is unusable. But you can extract 
data using

the rdump command in debugfs.ocfs. The success depends on how much of the
device is still usable.


On Fri, Nov 9, 2012 at 5:50 PM, Marian Serban mar...@easic.ro 
mailto:mar...@easic.ro wrote:


I tried hacking the fsck.ocfs2 source code by not considering
metaecc flag. Then I ran into

journal recovery: Bad magic number in inode while looking up the
journal inode for slot 0

fsck encountered unrecoverable errors while replaying the journals
and will not continue

After bypassing journal replay function, I got

Pass 0a: Checking cluster allocation chains
pass0: Bad magic number in inode while looking up the global
bitmap inode
fsck.ocfs2: Bad magic number in inode while performing pass 0


Does it mean the filesystem is destroyed completely?




On 10.11.2012 02:54, Marian Serban wrote:

That's the kernel:

Linux ro02xsrv003.bv.easic.ro http://ro02xsrv003.bv.easic.ro
2.6.39.4 #6 SMP Mon Dec 12 12:09:49 EET 2011 x86_64 x86_64 x86_64
GNU/Linux

Anyway, I tried disabling the metaecc feature, no luck.

[root@ro02xsrv003 ~]# tunefs.ocfs2 --fs-features=nometaecc
/dev/mapper/volgr1-lvol0
tunefs.ocfs2: I/O error on channel while opening device
/dev/mapper/volgr1-lvol0

These are the last lines of strace corresponding to the
tunefs.ocfs command:



open(/sys/fs/ocfs2/cluster_stack, O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0x7f54aad05000
read(4, o2cb\n, 4096) = 5
close(4)= 0
munmap(0x7f54aad05000, 4096)= 0
open(/sys/fs/o2cb/interface_revision, O_RDONLY) = 4
read(4, 5\n, 15)  = 2
read(4, , 13) = 0
close(4)= 0
stat(/sys/kernel/config, {st_mode=S_IFDIR|0755, st_size=0,
...}) = 0
statfs(/sys/kernel/config, {f_type=0x62656570, f_bsize=4096,
f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0,
f_fsid={0, 0}, f_namelen=255, f_frsize=4096}) = 0
open(/dev/mapper/volgr1-lvol0, O_RDONLY) = 4
ioctl(4, BLKSSZGET, 0x7fffce711454) = 0
close(4)= 0
pread(3,

\0\0\v\25\37\1\200\200\202@\21\2\30\26\0\0\0,\17\272\241\4\340\210\311\377\17\300\327\332\373\17...,
4096, 532480) = 4096
close(3)= 0
write(2, tunefs.ocfs2, 12tunefs.ocfs2)= 12
write(2, : , 2: ) = 2
write(2, I/O error on channel, 20I/O error on channel)= 20
write(2,  , 1 ) = 1
write(2, while opening device \/dev/mappe..., 47while opening
device /dev/mapper/volgr1-lvol0) = 47
write(2, \r\n, 2





On 10.11.2012 02:06, Sunil Mushran wrote:

It's either that or a check sum problem. Disable metaecc. Not
sure which kernel you are running.
We had fixed few problems few years ago around this. If your
kernel is older, then it could be
a known issue.


On Fri, Nov 9, 2012 at 12:50 PM, Marian Serban mar...@easic.ro
mailto:mar...@easic.ro wrote:

Hi Sunil,

Thank you for answering. Unfortunately, it doesn't seem like
it's a hardware problem. There's no way a cable can be loose
because it's iSCSI over 1G Ethernet (copper wires)
environment. Also I performed dd if=/dev/ of=/dev/null
and first 16GB or so are fine. Dmesg shows no errors.


Also tried with debugfs.ocfs2:


[root@ro02xsrv003 ~]# debugfs.ocfs2  /dev/mapper/volgr1-lvol0
debugfs.ocfs2 1.6.3
debugfs: ls
ls: Bad magic number in inode '.'
debugfs: slotmap
slotmap: Bad magic number in inode while reading slotmap
system file
debugfs: stats
Revision: 0.90
Mount Count: 0   Max Mount Count: 20
State: 0   Errors: 0
Check Interval: 0   Last Check: Fri Nov  9 14:35:53 2012
Creator OS: 0
Feature Compat: 3 backup-super strict-journal-super
Feature Incompat: 16208 sparse extended-slotmap
inline-data metaecc xattr indexed-dirs refcount discontig-bg
Tunefs Incomplete: 0
Feature RO compat: 7 unwritten usrquota grpquota
Root Blknum: 129   System Dir Blknum: 130
First Cluster Group Blknum: 64
Block 

Re: [Ocfs2-users] Huge Problem ocfs2

2012-11-09 Thread Sunil Mushran
Yes that should be enough for that. But that won't help if the real problem
is device related.

What does debugfs.ocfs2 -R ls -l / return? If that errors, means the root
dir is gone. Maybe
best to look into your backups.


On Fri, Nov 9, 2012 at 6:01 PM, Marian Serban mar...@easic.ro wrote:

  Nope, rdump doesn't work either.

 debugfs: rdump -v / /tmp
 Copying to /tmp/
 rdump: Bad magic number in inode while reading inode 129
 rdump: Bad magic number in inode while recursively dumping inode 129


 Could you please confirm that it's enough to just force the return value
 of 0 at ocfs2_validate_meta_ecc in order to bypass the ECC checks?




 On 10.11.2012 03:55, Sunil Mushran wrote:

 If global bitmap is gone. then the fs is unusable. But you can extract
 data using
 the rdump command in debugfs.ocfs. The success depends on how much of the
 device is still usable.


 On Fri, Nov 9, 2012 at 5:50 PM, Marian Serban mar...@easic.ro wrote:

  I tried hacking the fsck.ocfs2 source code by not considering metaecc
 flag. Then I ran into

 journal recovery: Bad magic number in inode while looking up the journal
 inode for slot 0

 fsck encountered unrecoverable errors while replaying the journals and
 will not continue

  After bypassing journal replay function, I got

 Pass 0a: Checking cluster allocation chains
 pass0: Bad magic number in inode while looking up the global bitmap inode
 fsck.ocfs2: Bad magic number in inode while performing pass 0


 Does it mean the filesystem is destroyed completely?




 On 10.11.2012 02:54, Marian Serban wrote:

 That's the kernel:

 Linux ro02xsrv003.bv.easic.ro 2.6.39.4 #6 SMP Mon Dec 12 12:09:49 EET
 2011 x86_64 x86_64 x86_64 GNU/Linux

 Anyway, I tried disabling the metaecc feature, no luck.

 [root@ro02xsrv003 ~]# tunefs.ocfs2 --fs-features=nometaecc
 /dev/mapper/volgr1-lvol0
 tunefs.ocfs2: I/O error on channel while opening device
 /dev/mapper/volgr1-lvol0

 These are the last lines of strace corresponding to the tunefs.ocfs
 command:



 open(/sys/fs/ocfs2/cluster_stack, O_RDONLY) = 4
 fstat(4, {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0
 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
 = 0x7f54aad05000
 read(4, o2cb\n, 4096) = 5
 close(4)= 0
 munmap(0x7f54aad05000, 4096)= 0
 open(/sys/fs/o2cb/interface_revision, O_RDONLY) = 4
 read(4, 5\n, 15)  = 2
 read(4, , 13) = 0
 close(4)= 0
 stat(/sys/kernel/config, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
 statfs(/sys/kernel/config, {f_type=0x62656570, f_bsize=4096,
 f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0},
 f_namelen=255, f_frsize=4096}) = 0
 open(/dev/mapper/volgr1-lvol0, O_RDONLY) = 4
 ioctl(4, BLKSSZGET, 0x7fffce711454) = 0
 close(4)= 0
 pread(3, 
 \0\0\v\25\37\1\200\200\202@\21\2\30\26\0\0\0,\17\272\241\4\340\210\311\377\17\300\327\332\373\17...,
 4096, 532480) = 4096
 close(3)= 0
 write(2, tunefs.ocfs2, 12tunefs.ocfs2)= 12
 write(2, : , 2: )   = 2
 write(2, I/O error on channel, 20I/O error on channel)= 20
 write(2,  , 1 )= 1
 write(2, while opening device \/dev/mappe..., 47while opening device
 /dev/mapper/volgr1-lvol0) = 47
 write(2, \r\n, 2





 On 10.11.2012 02:06, Sunil Mushran wrote:

 It's either that or a check sum problem. Disable metaecc. Not sure which
 kernel you are running.
 We had fixed few problems few years ago around this. If your kernel is
 older, then it could be
 a known issue.


 On Fri, Nov 9, 2012 at 12:50 PM, Marian Serban mar...@easic.ro wrote:

 Hi Sunil,

 Thank you for answering. Unfortunately, it doesn't seem like it's a
 hardware problem. There's no way a cable can be loose because it's iSCSI
 over 1G Ethernet (copper wires) environment. Also I performed dd
 if=/dev/ of=/dev/null and first 16GB or so are fine. Dmesg shows no
 errors.


 Also tried with debugfs.ocfs2:


 [root@ro02xsrv003 ~]# debugfs.ocfs2  /dev/mapper/volgr1-lvol0
 debugfs.ocfs2 1.6.3
 debugfs: ls
 ls: Bad magic number in inode '.'
 debugfs: slotmap
 slotmap: Bad magic number in inode while reading slotmap system file
 debugfs: stats
 Revision: 0.90
 Mount Count: 0   Max Mount Count: 20
 State: 0   Errors: 0
 Check Interval: 0   Last Check: Fri Nov  9 14:35:53 2012
 Creator OS: 0
 Feature Compat: 3 backup-super strict-journal-super
 Feature Incompat: 16208 sparse extended-slotmap inline-data
 metaecc xattr indexed-dirs refcount discontig-bg
 Tunefs Incomplete: 0
 Feature RO compat: 7 unwritten usrquota grpquota
 Root Blknum: 129   System Dir Blknum: 130
 First Cluster Group Blknum: 64
 Block Size Bits: 12   Cluster Size Bits: 18
 Max Node Slots: 10
 Extended Attributes Inline Size: 256
 

Re: [Ocfs2-users] Huge Problem ocfs2

2012-11-09 Thread Marian Serban

debugfs: ls /
ls: Bad magic number in inode while checking directory at block 129



On 10.11.2012 04:24, Sunil Mushran wrote:
Yes that should be enough for that. But that won't help if the real 
problem is device related.


What does debugfs.ocfs2 -R ls -l / return? If that errors, means the 
root dir is gone. Maybe

best to look into your backups.


On Fri, Nov 9, 2012 at 6:01 PM, Marian Serban mar...@easic.ro 
mailto:mar...@easic.ro wrote:


Nope, rdump doesn't work either.

debugfs: rdump -v / /tmp
Copying to /tmp/
rdump: Bad magic number in inode while reading inode 129
rdump: Bad magic number in inode while recursively dumping inode 129


Could you please confirm that it's enough to just force the return
value of 0 at ocfs2_validate_meta_ecc in order to bypass the ECC
checks?




On 10.11.2012 03:55, Sunil Mushran wrote:

If global bitmap is gone. then the fs is unusable. But you can
extract data using
the rdump command in debugfs.ocfs. The success depends on how
much of the
device is still usable.


On Fri, Nov 9, 2012 at 5:50 PM, Marian Serban mar...@easic.ro
mailto:mar...@easic.ro wrote:

I tried hacking the fsck.ocfs2 source code by not considering
metaecc flag. Then I ran into

journal recovery: Bad magic number in inode while looking up
the journal inode for slot 0

fsck encountered unrecoverable errors while replaying the
journals and will not continue

After bypassing journal replay function, I got

Pass 0a: Checking cluster allocation chains
pass0: Bad magic number in inode while looking up the global
bitmap inode
fsck.ocfs2: Bad magic number in inode while performing pass 0


Does it mean the filesystem is destroyed completely?




On 10.11.2012 02:54, Marian Serban wrote:

That's the kernel:

Linux ro02xsrv003.bv.easic.ro
http://ro02xsrv003.bv.easic.ro 2.6.39.4 #6 SMP Mon Dec 12
12:09:49 EET 2011 x86_64 x86_64 x86_64 GNU/Linux

Anyway, I tried disabling the metaecc feature, no luck.

[root@ro02xsrv003 ~]# tunefs.ocfs2 --fs-features=nometaecc
/dev/mapper/volgr1-lvol0
tunefs.ocfs2: I/O error on channel while opening device
/dev/mapper/volgr1-lvol0

These are the last lines of strace corresponding to the
tunefs.ocfs command:



open(/sys/fs/ocfs2/cluster_stack, O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f54aad05000
read(4, o2cb\n, 4096) = 5
close(4) = 0
munmap(0x7f54aad05000, 4096)= 0
open(/sys/fs/o2cb/interface_revision, O_RDONLY) = 4
read(4, 5\n, 15)  = 2
read(4, , 13) = 0
close(4) = 0
stat(/sys/kernel/config, {st_mode=S_IFDIR|0755, st_size=0,
...}) = 0
statfs(/sys/kernel/config, {f_type=0x62656570,
f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0,
f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096}) = 0
open(/dev/mapper/volgr1-lvol0, O_RDONLY) = 4
ioctl(4, BLKSSZGET, 0x7fffce711454) = 0
close(4) = 0
pread(3,

\0\0\v\25\37\1\200\200\202@\21\2\30\26\0\0\0,\17\272\241\4\340\210\311\377\17\300\327\332\373\17...,
4096, 532480) = 4096
close(3) = 0
write(2, tunefs.ocfs2, 12tunefs.ocfs2)= 12
write(2, : , 2: )   = 2
write(2, I/O error on channel, 20I/O error on channel)= 20
write(2,  , 1 )= 1
write(2, while opening device \/dev/mappe..., 47while
opening device /dev/mapper/volgr1-lvol0) = 47
write(2, \r\n, 2





On 10.11.2012 02:06, Sunil Mushran wrote:

It's either that or a check sum problem. Disable metaecc.
Not sure which kernel you are running.
We had fixed few problems few years ago around this. If
your kernel is older, then it could be
a known issue.


On Fri, Nov 9, 2012 at 12:50 PM, Marian Serban
mar...@easic.ro mailto:mar...@easic.ro wrote:

Hi Sunil,

Thank you for answering. Unfortunately, it doesn't seem
like it's a hardware problem. There's no way a cable
can be loose because it's iSCSI over 1G Ethernet
(copper wires) environment. Also I performed dd
if=/dev/ of=/dev/null and first 16GB or so are
fine. Dmesg shows no errors.


Also tried with debugfs.ocfs2:


[root@ro02xsrv003 ~]# debugfs.ocfs2
 /dev/mapper/volgr1-lvol0
debugfs.ocfs2 1.6.3
debugfs: ls
ls: Bad magic number in inode '.'
debugfs: slotmap

Re: [Ocfs2-users] Huge Problem ocfs2

2012-11-09 Thread Laurentiu Gosu

Hi Sunil,
Do you ANY other idea to recover our data? Maybe you know same recovery 
tool that we could use? We would really need it.

Thank you for your help.
Laurentiu.

On 11/10/2012 04:25, Marian Serban wrote:

debugfs: ls /
ls: Bad magic number in inode while checking directory at block 129



On 10.11.2012 04:24, Sunil Mushran wrote:
Yes that should be enough for that. But that won't help if the real 
problem is device related.


What does debugfs.ocfs2 -R ls -l / return? If that errors, means 
the root dir is gone. Maybe

best to look into your backups.


On Fri, Nov 9, 2012 at 6:01 PM, Marian Serban mar...@easic.ro 
mailto:mar...@easic.ro wrote:


Nope, rdump doesn't work either.

debugfs: rdump -v / /tmp
Copying to /tmp/
rdump: Bad magic number in inode while reading inode 129
rdump: Bad magic number in inode while recursively dumping inode 129


Could you please confirm that it's enough to just force the
return value of 0 at ocfs2_validate_meta_ecc in order to bypass
the ECC checks?




On 10.11.2012 03:55, Sunil Mushran wrote:

If global bitmap is gone. then the fs is unusable. But you can
extract data using
the rdump command in debugfs.ocfs. The success depends on how
much of the
device is still usable.


On Fri, Nov 9, 2012 at 5:50 PM, Marian Serban mar...@easic.ro
mailto:mar...@easic.ro wrote:

I tried hacking the fsck.ocfs2 source code by not
considering metaecc flag. Then I ran into

journal recovery: Bad magic number in inode while looking up
the journal inode for slot 0

fsck encountered unrecoverable errors while replaying the
journals and will not continue

After bypassing journal replay function, I got

Pass 0a: Checking cluster allocation chains
pass0: Bad magic number in inode while looking up the global
bitmap inode
fsck.ocfs2: Bad magic number in inode while performing pass 0


Does it mean the filesystem is destroyed completely?




On 10.11.2012 02:54, Marian Serban wrote:

That's the kernel:

Linux ro02xsrv003.bv.easic.ro
http://ro02xsrv003.bv.easic.ro 2.6.39.4 #6 SMP Mon Dec 12
12:09:49 EET 2011 x86_64 x86_64 x86_64 GNU/Linux

Anyway, I tried disabling the metaecc feature, no luck.

[root@ro02xsrv003 ~]# tunefs.ocfs2 --fs-features=nometaecc
/dev/mapper/volgr1-lvol0
tunefs.ocfs2: I/O error on channel while opening device
/dev/mapper/volgr1-lvol0

These are the last lines of strace corresponding to the
tunefs.ocfs command:



open(/sys/fs/ocfs2/cluster_stack, O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f54aad05000
read(4, o2cb\n, 4096) = 5
close(4) = 0
munmap(0x7f54aad05000, 4096)= 0
open(/sys/fs/o2cb/interface_revision, O_RDONLY) = 4
read(4, 5\n, 15)  = 2
read(4, , 13) = 0
close(4) = 0
stat(/sys/kernel/config, {st_mode=S_IFDIR|0755,
st_size=0, ...}) = 0
statfs(/sys/kernel/config, {f_type=0x62656570,
f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0,
f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096}) = 0
open(/dev/mapper/volgr1-lvol0, O_RDONLY) = 4
ioctl(4, BLKSSZGET, 0x7fffce711454) = 0
close(4) = 0
pread(3,

\0\0\v\25\37\1\200\200\202@\21\2\30\26\0\0\0,\17\272\241\4\340\210\311\377\17\300\327\332\373\17...,
4096, 532480) = 4096
close(3) = 0
write(2, tunefs.ocfs2, 12tunefs.ocfs2)= 12
write(2, : , 2: )   = 2
write(2, I/O error on channel, 20I/O error on channel)   
= 20

write(2,  , 1 )= 1
write(2, while opening device \/dev/mappe..., 47while
opening device /dev/mapper/volgr1-lvol0) = 47
write(2, \r\n, 2





On 10.11.2012 02:06, Sunil Mushran wrote:

It's either that or a check sum problem. Disable metaecc.
Not sure which kernel you are running.
We had fixed few problems few years ago around this. If
your kernel is older, then it could be
a known issue.


On Fri, Nov 9, 2012 at 12:50 PM, Marian Serban
mar...@easic.ro mailto:mar...@easic.ro wrote:

Hi Sunil,

Thank you for answering. Unfortunately, it doesn't
seem like it's a hardware problem. There's no way a
cable can be loose because it's iSCSI over 1G Ethernet
(copper wires) environment. Also I performed dd
if=/dev/ of=/dev/null and first 16GB or so are
fine. Dmesg shows no errors.


Also tried with debugfs.ocfs2: