Hi everyone, we have noticed some extremely odd behaviour with our distributed 
Gluster volume where duplicate files (same name, same or different content) are 
being created and stored on multiple bricks. The only consistent clue is that 
one of the duplicate files has the sticky bit set. I am hoping someone will be 
able to shed some light on why this is happening and how we can restore the 
volume as there appear to be hundreds of such files. I will try to provide as 
much pertinent information as I can.
 We have a 130TB Gluster volume consisting of two 20TB bricks on server1, and 
three 40TB bricks on a server2 which were added at a later date (and 
rebalancing was done). The volume is mounted on server1, and accessed only 
through this server but by many users. Both servers went down due to power loss 
several days ago after which this problem was first noticed. We ran a rebalance 
command on the volumes, this has not fixed the problem.
 
Gluster volume info:
Volume Name: safari
Type: Distribute
Volume ID: d48d0e6b-4389-4c2c-8fd1-cd2854121eda
Status: Started
Number of Bricks: 5
Transport-type: tcp
Bricks:
Brick1: server1:/data/glusterfs/safari/brick00/brick
Brick2: server1:/data/glusterfs/safari/brick01/brick
Brick3: server2:/data/glusterfs/safari/brick02/brick
Brick4: server2:/data/glusterfs/safari/brick03/brick
Brick5: server2:/data/glusterfs/safari/brick04/brick
 
Size information:
/dev/sdc 37T 16T 22T 42% /data/glusterfs/safari/brick02
/dev/sdd 37T 16T 22T 42% /data/glusterfs/safari/brick03
/dev/sde 37T 17T 21T 45% /data/glusterfs/safari/brick04
/dev/md126 11T 7.7T 2.8T 74% /data/glusterfs/safari/brick00
/dev/md124 11T 8.0T 2.5T 77% /data/glusterfs/safari/brick01
server2:/safari 130T 63T 68T 48% /sar
 
Example 1:
-Two files with the same name exist in one directory
-They have different contents and attributes
-A file listing on the mounted volume shows the same inode
-The newer file has sticky bit set
-Neither file is corrupted, they can both be viewed by using the absolute path 
(on the bricks)
 File listing on the mounted volume
13036730497538635177 -rw-rw-r-T 1 jon users 924 Dec 15 10:42 RSLC_tab
13036730497538635177 -rw-rw-r-- 1 jon users 418 Mar 18 2013 RSLC_tab
 Listing of the files on the bricks:
8925798411 -rw-rw-r-T+ 2 jon users 924 Dec 15 10:42 
/data/glusterfs/safari/brick00/brick/complete/shm/rs2/ottawa/mf6_asc/stack_org/RSLC_tab
51541886672 -rw-rw-r--+ 2 1002 users 418 Mar 18 2013 
/data/glusterfs/safari/brick02/brick/complete/shm/rs2/ottawa/mf6_asc/stack_org/RSLC_tab
 
Example 2: 
-Two files with the same name exist in one directory
-They have the same content and attributes
-No sticky bit is set when looking at file listing on the mounted volume
-Sticky bit is set for one while when looking at file listing on the bricks
-Files are corrupted
 File listing on the mounted volume:
13012555852904096080 -rw-rw-r-- 1 tom users 2393848 Dec 8 2013 
ifg_lr/20130226_20130813.diff.phi.ras
13012555852904096080 -rw-rw-r-- 1 tom users 2393848 Dec 8 2013 
ifg_lr/20130226_20130813.diff.phi.ras
 Listing of the files on the bricks:
17058578 -rw-rw-r-T+ 2 tom users 2393848 Dec 13 17:11 
/data/glusterfs/safari/brick00/brick/rsc/rs2/calgary/u22_dsc/stack_org/ifg_lr/20130226_20130813.diff.phi.ras
57986922129 -rw-rw-r--+ 2 1010 users 2393848 Dec 8 2013 
/data/glusterfs/safari/brick02/brick/rsc/rs2/calgary/u22_dsc/stack_org/ifg_lr/20130226_20130813.diff.phi.ras
 
Additionally, only some files in this directory are duplicated. The duplicated 
files are corrupted (can not be viewed as Raster images: the original file 
type) 
The files which are not duplicated are not corrupted.
 File command: (notice duplicate and singleton files)
ifg_lr/20091021_20100218.diff.phi.ras: Sun raster image data, 1208 x 1981, 
8-bit, RGB colormap
ifg_lr/20091021_20101016.diff.phi.ras: data
ifg_lr/20091021_20101016.diff.phi.ras: data
ifg_lr/20091021_20101109.diff.phi.ras: Sun raster image data, 1208 x 1981, 
8-bit, RGB colormap
ifg_lr/20091021_20101203.diff.phi.ras: Sun raster image data, 1208 x 1981, 
8-bit, RGB colormap
ifg_lr/20091021_20101227.diff.phi.ras: Sun raster image data, 1208 x 1981, 
8-bit, RGB colormap
ifg_lr/20091021_20110120.diff.phi.ras: Sun raster image data, 1208 x 1981, 
8-bit, RGB colormap
ifg_lr/20091021_20110213.diff.phi.ras: data
ifg_lr/20091021_20110213.diff.phi.ras: data
ifg_lr/20091021_20110309.diff.phi.ras: data
ifg_lr/20091021_20110309.diff.phi.ras: sticky data
ifg_lr/20091021_20110402.diff.phi.ras: Sun raster image data, 1208 x 1981, 
8-bit, RGB colormap
 
Information from Gluster log file:
Additionally, the log is full of thousands of the following such lines 
(possibly, one for each directory?) dating back several mponths
27 [2014-12-12 11:10:10.257950] I [dht-layout.c:726:dht_layout_dir_mismatch] 
3-safari-dht: /rsc/tsx/lasvegas/spot_asc/stack/ifg_lr - disk layout missing
28 [2014-12-12 11:10:10.257988] I [dht-common.c:623:dht_revalidate_cbk] 
3-safari-dht: mismatching layouts for /rsc/tsx/lasvegas/spot_asc/stack/ifg_ lr
29 [2014-12-12 11:10:13.042362] I [dht-layout.c:726:dht_layout_dir_mismatch] 
3-safari-dht: /rsc/tsx/lasvegas/spot_dsc/stack/ifg_lr - disk layout missing
30 [2014-12-12 11:10:13.042395] I [dht-common.c:623:dht_revalidate_cbk] 
3-safari-dht: mismatching layouts for /rsc/tsx/lasvegas/spot_dsc/stack/ifg_ lr
31 [2014-12-12 11:10:15.685876] I [dht-layout.c:726:dht_layout_dir_mismatch] 
3-safari-dht: /rsc/tsx/lasvegas/spot_dsc/stack/ifg_lr - disk layout missing
32 [2014-12-12 11:10:15.685921] I [dht-common.c:623:dht_revalidate_cbk] 
3-safari-dht: mismatching layouts for /rsc/tsx/lasvegas/spot_dsc/stack/ifg_ lr
33 [2014-12-12 11:10:19.028518] I [dht-layout.c:726:dht_layout_dir_mismatch] 
3-safari-dht: /rsc/tsx/lasvegas/spot_asc/stack/ifg_lr - disk layout missing
 
There are also 1394 of the following errors in the last year, several (but not 
all) of them seem to correspond to duplicate files:
40620 [2014-12-12 22:55:57.180486] W 
[client-rpc-fops.c:1994:client3_3_setattr_cbk] 0-safari-client-1: remote 
operation failed: Operation not permitte d
40621 [2014-12-12 22:55:57.180514] E 
[dht-linkfile.c:213:dht_linkfile_setattr_cbk] 0-safari-dht: setattr of uid/gid 
on /freeport/tsx/miami/sm_asc/stac k/ifg_lr/20140930_20141102.diff.phi.ras 
:<gfid:00000000-0000-0000-0000-000000000000> failed (Operation not permitted)
 Thanks,
Tom
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Reply via email to