----- Original Message ----- > Bob Peterson <rpete...@redhat.com> writes: > > > [...] > > > Hi Daniel, > > > > I'm downloading the metadata now. I'll let you know what I find. > > It may take a while because my storage is a bit in flux at the moment. > > Ok, thanks a lot for looking at our problems. > > Regards. > -- > Daniel Dehennin > Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF > Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF
Hi Daniel, I took a look at that metadata you sent me, but I didn't find any evidence relating to the problem you posted. Either the corruption happened a long time prior to your saving of the metadata, or else the metadata was saved after an fsck.gfs2 fixed (or attempted to fix) the problem? One thing's for sure: I don't see any evidence of wild file system corruption; certainly nothing that can account for those errors. You said the problem seemed to revolve around a gfs2_grow operation, right? Can you make sure the lvm2 volume group has the clustered bit set? Please do the "vgs" command and see if that volume has "c" listed in its flags. If not, it could have caused problems for the gfs2_grow. I've seen problems like this very rarely. Once was a legitimate bug in GFS2 that we fixed in RHEL5, but I assume your kernel is newer than that. The other problem we weren't able to solve because there was no evidence of what went wrong. My only working theory is this: This might be related to the transition between "unlinked" dinodes and "free". After a file is deleted, it goes to "unlinked" and has to be transitioned to "free". This sometimes goes wrong because of the way it needs to check what other nodes in the cluster are doing. Maybe: If you have three nodes, and a file was unlinked on node 1, then maybe the internode communication got confused and nodes 2 and 3 both tried to transition it from Unlinked to Free. That is only a theory, and there is absolutely no proof. However, I have a set of patches that are experimental, and not even in the upstream kernel yet (hopefully soon!) that try to tighten up and fix problems like this. It's much more common for multiple nodes to try to transition from Unlinked to Free, and they all fail, leaving the file in an "Unlinked" state. Regards, Bob Peterson Red Hat File Systems -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster