Hi,

On Wed, Sep 21, 2016 at 10:12:44PM +0530, Ravishankar N wrote:
> On 09/21/2016 06:45 PM, Pasi Kärkkäinen wrote:
> >Hello,
> >
> >I have a pretty basic two-node gluster 3.7 setup, with a volume 
> >replicated/mirrored to both servers.
> >
> >One of the servers was down for hardware maintenance, and later when it got 
> >back up, the healing process started, re-syncing files.
> >In the beginning there was some 200 files that need to be synced, and now 
> >the number of files is down to 10, but it seems the last 10 files don't seem 
> >to get synced..
> >
> >So the problem is the healing/re-sync never ends for these files..
> >
> >
> ># gluster volume heal gvol1 info
> >Brick gnode1:/bricks/vol1/brick1
> >/foo
> >/ - Possibly undergoing heal
> >
> >/foo6
> >/foo8
> >/foo7
> >/foo9
> >/foo2
> >/foo5
> >/foo4
> >/foo3
> >Status: Connected
> >Number of entries: 10
> >
> >Brick gnode2:/bricks/vol1/brick1
> >/
> >Status: Connected
> >Number of entries: 1
> >
> >
> >In the brick logs for the volume I see these errors repeating:
> >
> >[2016-09-21 12:41:43.063209] E [MSGID: 113002] [posix.c:252:posix_lookup] 
> >0-gvol1-posix: buf->ia_gfid is null for /bricks/vol1/brick1/foo [No data 
> >available]
> >[2016-09-21 12:41:43.063266] E [MSGID: 115050] 
> >[server-rpc-fops.c:179:server_lookup_cbk] 0-gvol1-server: 1484202: LOOKUP 
> >/foo (00000000-0000-0000-0000-000000000001/foo) ==> (No data available) [No 
> >data available]
> >
> >
> >Any idea what might cause those errors?  (/foo is exactly the file that is 
> >being healed, but fails to heal)
> >Any tricks to try?
> 
> Can you check if the 'trusted.gfid' xattr is present for those files
> on the bricks and the files also have the associated hardlink inside
> .glusterfs? You can refer to 
> https://joejulian.name/blog/what-is-this-new-glusterfs-directory-in-33/
> if you are not familiar with the .glusterfs directory.
> 

Let's see.

# getfattr -m . -d -e hex /bricks/vol1/brick1/foo
getfattr: Removing leading '/' from absolute path names
# file: bricks/vol1/brick1/foo
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000

So hmm.. no trusted.gfid it seems.. is that perhaps because this node was down 
when the file was created?


On another node:

# getfattr -m . -d -e hex /bricks/vol1/brick1/foo
getfattr: Removing leading '/' from absolute path names
# file: bricks/vol1/brick1/foo
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.gvol1-client-1=0x000016620000000100000000
trusted.bit-rot.version=0x020000000000000057e00db5000624ed
trusted.gfid=0xc1ca778ed2af4828b981171c0c5bd45e

So there we have the gfid.. 

How do I fix this and allow healing process to continue/finish..  ? 


Thanks,

-- Pasi

> -Ravi
> 
> >
> >Software versions: CentOS 7 with gluster37 repo (running Gluster 3.7.15), 
> >and nfs-ganesha 2.3.3.
> >
> >
> >Thanks a lot,
> >
> >-- Pasi
> >

_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Reply via email to