[Ocfs2-users] fsck doesn't fix bad chain

2011-09-16 Thread Andre Nathan
Hello For a while I had seen errors like this in the kernel logs: OCFS2: ERROR (device drbd5): ocfs2_validate_gd_parent: Group descriptor #69084874 has bad chain 126 File system is now read-only due to the potential of on-disk corruption. Please run fsck.ocfs2 once the file system is

Re: [Ocfs2-users] No space left on device

2011-04-27 Thread Andre Nathan
On Wed, 2011-04-27 at 12:47 -0300, Andre Nathan wrote: I have tried running fsck on both nodes, but it doesn't fix the problem. Is there something else I can try? I managed to fix it by installing ocfs2-tools 1.6.4 and enabling the discontig-bg feature. Regards, Andre

Re: [Ocfs2-users] No space left on device

2011-04-27 Thread Andre Nathan
On Wed, 2011-04-27 at 21:15 +0200, Stefan Priebe - Profihost AG wrote: since which kernel version is this supported? 2.6.35, I believe (luckily the exact version I had installed). Is there a doc about it? Not that I could find. I did find this email from ocfs2-devel though, pointing to the

[Ocfs2-users] Unmounting in one node affects the other node

2011-02-09 Thread Andre Nathan
Hello I have an active-active drbd+ocfs2 cluster running dovecot as an imap server. Currently our load-balancer is configured to only forward connections to one of the nodes, as explained in bug #1297. The shared filesystems are still mounted on both machines, though. Today we had an issue at

Re: [Ocfs2-users] ocfs2_delete_inode kernel bug

2010-12-17 Thread Andre Nathan
I've just updated the bug report with a link to two VirtualBox VMs and instructions to reproduce the bug using them. I hope this helps with finding the problem. Thanks Andre On Thu, 2010-11-04 at 14:06 -0200, Andre Nathan wrote: Hello Sunil I have created a simpler environment for testing

Re: [Ocfs2-users] ocfs2_delete_inode kernel bug

2010-12-07 Thread Andre Nathan
On Wed, 2010-11-10 at 11:35 -0200, Andre Nathan wrote: I have detailed all the errors at http://oss.oracle.com/bugzilla/show_bug.cgi?id=1297 Does anyone know what can be a possible trigger for the orphan inodes problem, or what exactly an orphan inode is? I'm trying to write a script

Re: [Ocfs2-users] ocfs2_delete_inode kernel bug

2010-11-10 Thread Andre Nathan
On Thu, 2010-10-28 at 14:21 -0700, Joel Becker wrote: I'm starting to think that DRBD isn't keeping a consistent view of the devices between your servers. I managed to reproduce these errors in a test setup, including a configuration where I replaced the active-active DRBD by a shared AoE

Re: [Ocfs2-users] ocfs2_delete_inode kernel bug

2010-11-04 Thread Andre Nathan
, Sunil Mushran wrote: On 10/26/2010 10:39 AM, Andre Nathan wrote: On Tue, 2010-10-26 at 10:14 -0700, Sunil Mushran wrote: So the backup server is not part of the cluster but yet reading the same block device. As long as it is only reading, it should not affect the two nodes, but I

Re: [Ocfs2-users] ocfs2_delete_inode kernel bug

2010-10-27 Thread Andre Nathan
Hello Sunil The errors happened again, but now I think it may be completely fixed. I only got the -17 error for a single inode this time: # grep -E Oct 2[78] /var/log/kern.log|grep -oE ERROR: Inode [0-9]+| sort|uniq -c 35 ERROR: Inode 16671031 I ran fsck.ocfs2 -y -f in all my volumes. I got

Re: [Ocfs2-users] ocfs2_delete_inode kernel bug

2010-10-26 Thread Andre Nathan
On Thu, 2010-10-21 at 10:52 -0700, Sunil Mushran wrote: That said, the first issue (-17) is a known one that was fixed in 2.6.34. I'm still seeing the same messages even after upgrading to 2.6.35: [96387.917789] (ocfs2_wq,3467,0):ocfs2_delete_inode:1043 ERROR: status = -17 [96387.920621]

Re: [Ocfs2-users] ocfs2_delete_inode kernel bug

2010-10-26 Thread Andre Nathan
On Tue, 2010-10-26 at 10:14 -0700, Sunil Mushran wrote: So the backup server is not part of the cluster but yet reading the same block device. As long as it is only reading, it should not affect the two nodes, but I will not trust the backup. The error mentions the inode#. Has it changed?

Re: [Ocfs2-users] ocfs2_delete_inode kernel bug

2010-10-26 Thread Andre Nathan
On Tue, 2010-10-26 at 10:14 -0700, Sunil Mushran wrote: So the backup server is not part of the cluster but yet reading the same block device. As long as it is only reading, it should not affect the two nodes, but I will not trust the backup. The backup server is not part of the cluster, but

Re: [Ocfs2-users] ocfs2_delete_inode kernel bug

2010-10-26 Thread Andre Nathan
On Tue, 2010-10-26 at 17:17 -0700, Sunil Mushran wrote: That means the fsck fixed the older orphaned inodes. That it is happening with 10.10 is troubling. Are you sure all nodes are on 10.10? The nodes are running 10.04 but I have recompiled the kernel, ocfs2-tools and drbd-tools packages with

[Ocfs2-users] General protection fault

2010-09-17 Thread Andre Nathan
Hello I have an active-active DRBD cluster using OCFS2 as the filesystem on the drbd devices. I started getting a general protection fault error when trying to mount any one of the ocfs2 volumes I have, even when running mount on a single node, with no mounted FSs on the other node. The kernel