Hello folks. A new conundrum to make sure that my life with GlusterFS doesn't
become boring :-)
Configuration at end of this message:
On client - directory appears to be empty:
# ls -l /pfs2/online_archive/2011/01
total 0
fgrep -C 2 inode /var/log/glusterfs/pfs2.log | tail -10
[2011-05-18 14:40:11.665045] E
[afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6:
Unable to self-heal permissions/ownership of '/' (possib
le split-brain). Please fix the file on all backend volumes
[2011-05-18 14:43:47.810045] E [rpc-clnt.c:199:call_bail] 0-pfs-ro1-client-1:
bailing out frame type(GlusterFS 3.1) op(INODELK(29)) xid = 0x130824x sent =
2011-0
5-18 14:13:45.978987. timeout = 1800
[2011-05-18 14:53:12.311323] E [afr-common.c:110:afr_set_split_brain]
0-pfs-ro1-replicate-0: invalid argument: inode
[2011-05-18 15:00:32.240373] E
[afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6:
Unable to self-heal permissions/ownership of '/' (possib
le split-brain). Please fix the file on all backend volumes
[2011-05-18 15:10:12.282848] E
[afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6:
Unable to self-heal permissions/ownership of '/' (possib
le split-brain). Please fix the file on all backend volumes
--
[2011-05-19 10:10:25.967246] E
[afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6:
Unable to self-heal permissions/ownership of '/' (possib
le split-brain). Please fix the file on all backend volumes
[2011-05-19 10:20:18.551953] E
[afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6:
Unable to self-heal permissions/ownership of '/' (possib
le split-brain). Please fix the file on all backend volumes
[2011-05-19 10:29:34.834256] E [afr-common.c:110:afr_set_split_brain]
0-pfs-ro1-replicate-0: invalid argument: inode
[2011-05-19 10:30:06.898152] E
[afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6:
Unable to self-heal permissions/ownership of '/' (possib
le split-brain). Please fix the file on all backend volumes
[2011-05-19 10:32:05.258799] E
[afr-self-heal-metadata.c:524:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6:
Unable to self-heal permissions/ownership of '/' (possib
le split-brain). Please fix the file on all backend volumes
On server - directory is populated:
loop_check ' ls -l /export/read-only/g*/online_archive/2011/01'
jc1letgfs{14,15,17,18} | less
jc1letgfs14
/export/read-only/g01/online_archive/2011/01:
total 80
drwxrwxrwt 3 403 1009 4096 May 4 10:35 03
drwxrwxrwt 3 107421 1009 4096 May 7 12:18 04
drwxrwxrwt 3 107421 1009 4096 May 4 10:35 05
drwxrwxrwt 3 107421 1009 4096 May 4 10:36 06
drwxrwxrwt 3 107421 1009 4096 May 4 10:36 07
drwxrwxrwt 3 107421 1009 4096 May 4 10:41 10
drwxrwxrwt 3 107421 1009 4096 May 4 10:37 11
drwxrwxrwt 3 107421 1009 4096 May 4 10:43 12
drwxrwxrwt 3 107421 1009 4096 May 4 10:43 13
drwxrwxrwt 3 107421 1009 4096 May 4 10:44 14
drwxrwxrwt 3 107421 1009 4096 May 4 10:46 18
drwxrwxrwt 3 107421 1009 4096 Apr 14 14:11 19
drwxrwxrwt 3 107421 1009 4096 May 4 10:43 20
drwxrwxrwt 3 107421 1009 4096 May 4 10:49 21
drwxrwxrwt 3 107421 1009 4096 May 4 10:45 24
drwxrwxrwt 3 107421 1009 4096 May 4 10:47 25
drwxrwxrwt 3 107421 1009 4096 May 4 10:52 26
drwxrwxrwt 3 107421 1009 4096 May 4 10:49 27
drwxrwxrwt 3 107421 1009 4096 May 4 10:50 28
drwxrwxrwt 3 107421 1009 4096 May 4 10:56 31
(and shows on every brick the same)
And from the server logs:
root@jc1letgfs17:/var/log/glusterfs# fgrep '2011-05-19 10:39:30'
bricks/export-read-only-g*.log
[2011-05-19 10:39:30.306661] E [posix.c:438:posix_lookup] 0-pfs-ro1-posix:
lstat on /online_archive/2011/01/21 failed: No data available
[2011-05-19 10:39:30.307754] E [posix.c:438:posix_lookup] 0-pfs-ro1-posix:
lstat on /online_archive/2011/01/21 failed: No data available
[2011-05-19 10:39:30.308230] E [posix.c:438:posix_lookup] 0-pfs-ro1-posix:
lstat on /online_archive/2011/01/21 failed: No data available
[2011-05-19 10:39:30.322342] E [posix.c:438:posix_lookup] 0-pfs-ro1-posix:
lstat on /online_archive/2011/01/21 failed: No data available
[2011-05-19 10:39:30.421298] E [posix.c:438:posix_lookup] 0-pfs-ro1-posix:
lstat on /online_archive/2011/01/21 failed: No data available
The only two things that jump out so far are:
the permissions on the directories under
/export/read-only/g01/online_archive/2011/01 are 7777, whereas on the
directories under /export/read-only/g01/online_archive/2010/01 are just 755.
The lstat "no data available errors" only see to appear on the problem
directories.
Any hints or suggestions would be greatly appreciated. Thanks, James
Config:
All on Gluster 3.1.3
Servers:
4 CentOS 5.5 (ProLiant DL370 G6 servers, Intel Xeon 3200 MHz),
Each with:
Single P812 Smart Array Controller,
Single MDS600 with 70 2TB SATA drives configured as RAID 50
48 MB RAM
Clients:
185 CentOS 5.2 (mostly DL360 G6).
/pfs2 is the mount point for a Duplicated-Replicate volume of 4 servers.
Volume Name: pfs-ro1
Type: Distributed-Replicate
Status: Started
Number of Bricks: 20 x 2 = 40
Transport-type: tcp
Bricks:
Brick1: jc1letgfs17-pfs1:/export/read-only/g01
Brick2: jc1letgfs18-pfs1:/export/read-only/g01
Brick3: jc1letgfs17-pfs1:/export/read-only/g02
Brick4: jc1letgfs18-pfs1:/export/read-only/g02
Brick5: jc1letgfs17-pfs1:/export/read-only/g03
Brick6: jc1letgfs18-pfs1:/export/read-only/g03
Brick7: jc1letgfs17-pfs1:/export/read-only/g04
Brick8: jc1letgfs18-pfs1:/export/read-only/g04
Brick9: jc1letgfs17-pfs1:/export/read-only/g05
Brick10: jc1letgfs18-pfs1:/export/read-only/g05
Brick11: jc1letgfs17-pfs1:/export/read-only/g06
Brick12: jc1letgfs18-pfs1:/export/read-only/g06
Brick13: jc1letgfs17-pfs1:/export/read-only/g07
Brick14: jc1letgfs18-pfs1:/export/read-only/g07
Brick15: jc1letgfs17-pfs1:/export/read-only/g08
Brick16: jc1letgfs18-pfs1:/export/read-only/g08
Brick17: jc1letgfs17-pfs1:/export/read-only/g09
Brick18: jc1letgfs18-pfs1:/export/read-only/g09
Brick19: jc1letgfs17-pfs1:/export/read-only/g10
Brick20: jc1letgfs18-pfs1:/export/read-only/g10
Brick21: jc1letgfs14-pfs1:/export/read-only/g01
Brick22: jc1letgfs15-pfs1:/export/read-only/g01
Brick23: jc1letgfs14-pfs1:/export/read-only/g02
Brick24: jc1letgfs15-pfs1:/export/read-only/g02
Brick25: jc1letgfs14-pfs1:/export/read-only/g03
Brick26: jc1letgfs15-pfs1:/export/read-only/g03
Brick27: jc1letgfs14-pfs1:/export/read-only/g04
Brick28: jc1letgfs15-pfs1:/export/read-only/g04
Brick29: jc1letgfs14-pfs1:/export/read-only/g05
Brick30: jc1letgfs15-pfs1:/export/read-only/g05
Brick11: jc1letgfs17-pfs1:/export/read-only/g06
Brick12: jc1letgfs18-pfs1:/export/read-only/g06
Brick13: jc1letgfs17-pfs1:/export/read-only/g07
Brick14: jc1letgfs18-pfs1:/export/read-only/g07
Brick15: jc1letgfs17-pfs1:/export/read-only/g08
Brick16: jc1letgfs18-pfs1:/export/read-only/g08
Brick17: jc1letgfs17-pfs1:/export/read-only/g09
Brick18: jc1letgfs18-pfs1:/export/read-only/g09
Brick19: jc1letgfs17-pfs1:/export/read-only/g10
Brick20: jc1letgfs18-pfs1:/export/read-only/g10
Brick21: jc1letgfs14-pfs1:/export/read-only/g01
Brick22: jc1letgfs15-pfs1:/export/read-only/g01
Brick23: jc1letgfs14-pfs1:/export/read-only/g02
Brick24: jc1letgfs15-pfs1:/export/read-only/g02
Brick25: jc1letgfs14-pfs1:/export/read-only/g03
Brick26: jc1letgfs15-pfs1:/export/read-only/g03
Brick27: jc1letgfs14-pfs1:/export/read-only/g04
Brick28: jc1letgfs15-pfs1:/export/read-only/g04
Brick29: jc1letgfs14-pfs1:/export/read-only/g05
Brick30: jc1letgfs15-pfs1:/export/read-only/g05
Brick31: jc1letgfs14-pfs1:/export/read-only/g06
Brick32: jc1letgfs15-pfs1:/export/read-only/g06
Brick33: jc1letgfs14-pfs1:/export/read-only/g07
Brick34: jc1letgfs15-pfs1:/export/read-only/g07
Brick35: jc1letgfs14-pfs1:/export/read-only/g08
Brick36: jc1letgfs15-pfs1:/export/read-only/g08
Brick37: jc1letgfs14-pfs1:/export/read-only/g09
Brick38: jc1letgfs15-pfs1:/export/read-only/g09
Brick39: jc1letgfs14-pfs1:/export/read-only/g10
Brick40: jc1letgfs15-pfs1:/export/read-only/g10
Options Reconfigured:
diagnostics.brick-log-level: ERROR
cluster.metadata-change-log: on
diagnostics.client-log-level: ERROR
performance.stat-prefetch: on
performance.cache-size: 2GB
network.ping-timeout: 10
DISCLAIMER:
This e-mail, and any attachments thereto, is intended only for use by the
addressee(s) named herein and may contain legally privileged and/or
confidential information. If you are not the intended recipient of this e-mail,
you are hereby notified that any dissemination, distribution or copying of this
e-mail, and any attachments thereto, is strictly prohibited. If you have
received this in error, please immediately notify me and permanently delete the
original and any copy of any e-mail and any printout thereof. E-mail
transmission cannot be guaranteed to be secure or error-free. The sender
therefore does not accept liability for any errors or omissions in the contents
of this message which arise as a result of e-mail transmission.
NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at its
discretion, monitor and review the content of all e-mail communications.
http://www.knight.com
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users