Re: [Ocfs2-users] Huge Problem ocfs2

2012-11-12 Thread Joel Becker
On Mon, Nov 12, 2012 at 01:24:30AM +0200, Laurentiu Gosu wrote: > We managed to track down the problem: the inodes which hold the > RootDirectory and System Directory(and probably others ..like hb) > were overwritten somehow(!?). > Using debugfs and a lot of detective work Marian found the inode >

Re: [Ocfs2-users] Huge Problem ocfs2

2012-11-11 Thread Laurentiu Gosu
Hi, We managed to track down the problem: the inodes which hold the RootDirectory and System Directory(and probably others ..like hb) were overwritten somehow(!?). Using debugfs and a lot of detective work Marian found the inode number of one of the sub-folders and then we cd .. until the most

Re: [Ocfs2-users] Huge Problem ocfs2

2012-11-09 Thread Laurentiu Gosu
Hi Sunil, Do you ANY other idea to recover our data? Maybe you know same recovery tool that we could use? We would really need it. Thank you for your help. Laurentiu. On 11/10/2012 04:25, Marian Serban wrote: debugfs: ls / ls: Bad magic number in inode while checking directory at block 129

Re: [Ocfs2-users] Huge Problem ocfs2

2012-11-09 Thread Marian Serban
debugfs: ls / ls: Bad magic number in inode while checking directory at block 129 On 10.11.2012 04:24, Sunil Mushran wrote: Yes that should be enough for that. But that won't help if the real problem is device related. What does debugfs.ocfs2 -R "ls -l /" return? If that errors, means the r

Re: [Ocfs2-users] Huge Problem ocfs2

2012-11-09 Thread Sunil Mushran
Yes that should be enough for that. But that won't help if the real problem is device related. What does debugfs.ocfs2 -R "ls -l /" return? If that errors, means the root dir is gone. Maybe best to look into your backups. On Fri, Nov 9, 2012 at 6:01 PM, Marian Serban wrote: > Nope, rdump does

Re: [Ocfs2-users] Huge Problem ocfs2

2012-11-09 Thread Marian Serban
Nope, rdump doesn't work either. debugfs: rdump -v / /tmp Copying to /tmp/ rdump: Bad magic number in inode while reading inode 129 rdump: Bad magic number in inode while recursively dumping inode 129 Could you please confirm that it's enough to just force the return value of 0 at "ocfs2_valid

Re: [Ocfs2-users] Huge Problem ocfs2

2012-11-09 Thread Sunil Mushran
If global bitmap is gone. then the fs is unusable. But you can extract data using the rdump command in debugfs.ocfs. The success depends on how much of the device is still usable. On Fri, Nov 9, 2012 at 5:50 PM, Marian Serban wrote: > I tried hacking the fsck.ocfs2 source code by not consideri

Re: [Ocfs2-users] Huge Problem ocfs2

2012-11-09 Thread Marian Serban
I tried hacking the fsck.ocfs2 source code by not considering metaecc flag. Then I ran into journal recovery: Bad magic number in inode while looking up the journal inode for slot 0 fsck encountered unrecoverable errors while replaying the journals and will not continue After bypassing journ

Re: [Ocfs2-users] Huge Problem ocfs2

2012-11-09 Thread Marian Serban
That's the kernel: Linux ro02xsrv003.bv.easic.ro 2.6.39.4 #6 SMP Mon Dec 12 12:09:49 EET 2011 x86_64 x86_64 x86_64 GNU/Linux Anyway, I tried disabling the metaecc feature, no luck. [root@ro02xsrv003 ~]# tunefs.ocfs2 --fs-features=nometaecc /dev/mapper/volgr1-lvol0 tunefs.ocfs2: I/O error on

Re: [Ocfs2-users] Huge Problem ocfs2

2012-11-09 Thread Sunil Mushran
It's either that or a check sum problem. Disable metaecc. Not sure which kernel you are running. We had fixed few problems few years ago around this. If your kernel is older, then it could be a known issue. On Fri, Nov 9, 2012 at 12:50 PM, Marian Serban wrote: > Hi Sunil, > > Thank you for answ

Re: [Ocfs2-users] Huge Problem ocfs2

2012-11-09 Thread Marian Serban
Hi Sunil, Thank you for answering. Unfortunately, it doesn't seem like it's a hardware problem. There's no way a cable can be loose because it's iSCSI over 1G Ethernet (copper wires) environment. Also I performed "dd if=/dev/ of=/dev/null" and first 16GB or so are fine. "Dmesg" shows no e

Re: [Ocfs2-users] Huge Problem ocfs2

2012-11-09 Thread Sunil Mushran
IO error on channel means the system cannot talk to the block device. The problem is in the block layer. Maybe a loose cable or a setup problem. dmesg should show errors. On Fri, Nov 9, 2012 at 10:46 AM, Laurentiu Gosu wrote: > Hi, > I'm using ocfs2 cluster in a production environment since al

[Ocfs2-users] Huge Problem ocfs2

2012-11-09 Thread Laurentiu Gosu
Hi, I'm using ocfs2 cluster in a production environment since almost 1 year. During this time i had to run a fsck.ocfs2 few months ago due to some errors but they were fixed. Now i have a big problem: I'm not able to mount the volume on any of the nodes. I stopped all nodes except one. Some out