Re: [Gluster-users] Vol full of ..gfs after migrate-data

Dan Bretherton Mon, 02 Jan 2012 04:08:09 -0800

On Monday 02 January 2012 05:05 AM, Dan Bretherton wrote:
On 03/10/11 19:08, Dan Bretherton wrote:
On 02/10/11 02:12, Amar Tumballi wrote:
Dan,

Answer inline.
On 02-Oct-2011, at 1:26 AM, DanBretherton<[email protected]> wrote:
Hello All,
I have been testing rebalance...migrate-data in GlusterFS version3.2.3, following add-brick and fix-layout. After migrate-datathe the volume is 97% full with some bricks being 100% full. Ihave not added any files to the volume so there should be anamount of free space at least as big as the new bricks that wereadded. However, it seems as if all the extra space has beentaken up with files matching the pattern .*.gfs*. I presumethese are temporary files used for the transfer real files, whichshould have been renamed once the transfers were completed andverified, and the original versions deleted. The new brickscontain mostly these temporary files, and zero byte link filespointing to the corresponding real files on other bricks. Anexample of such a pair is shown below.
---------T 1 root root 0 Sep 30 03:14/mnt/local/glusterfs/root/backup/behemoth_system/bin-rwxr-xr-x 1 root root 60416 Sep 30 18:20/mnt/local/glusterfs/root/backup/behemoth_system/bin/.df.gfs60416
Is this a known bug, and is there a work-around? If not, is itsafe to delete the .*.gfs* files so I can at least use the volume?
This is not a known issue but surely seems like a bug. If thesource file is intact you can delete the temp file to get thespace back. Also if md5sum is same, you can rename temp file tooriginal, so you get space in existing bricks.
Regards,
Amar
Regards
Dan Bretherton

_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Amar- Thanks for the information and the patch. Theetc-glusterd-mount-<volname>.log file can be downloaded from here:
http://www.nerc-essc.ac.uk/~dab/etc-glusterd-mount-backup.log.tar.gz

I am using CentOS 5.5 by the way.

-Dan.
Hello again-
I tested the patch and I confirm that it works; there are no *.gfs*files in my volume after performing a migrate-data operation.However there is still something not quite right. One of thereplicated brick pairs is 100% full, whereas the others areapproximately 50% full. I would have expected all the bricks tocontain roughly the same amount of data after migrate-data, and thiseffect is mainly what I want to use migrate-data for. Do you whythis might have happened or how to avoid it? The log files from thelatest migrate-data operation can be downloaded from here:
http://www.nerc-essc.ac.uk/~dab/backup_migrate-data_logs.tar.gz

-Dan.
Hello Amar and gluster-users,
I have tested rebalance...migrate-data in version 3.2.5 and foundthree serious problems still present.
1) There are lots of *.gfs* files after migrate-data. This didn'thappen when I tested the patched version of 3.2.4.2) There are lots of duplicate files after migrate-data, i.e. lots offiles seen twice at the mount point. I have never seen this happenbefore, and I would really like to know how to repair the volume.There are ~6000 duplicates out of a total of ~1 million files in thevolume, so dealing with each one individually would be impractical.3) A lot of files have wrong permissions after migrate-data. Forexample, -rwsr-xr-x commonly becomes -rwxr-xr-x, and -rw-rw-r--commonly becomes -rw-r--r--.
Are these known problems, and if so is there a new version with fixesin the pipeline?
Regards
Dan.

_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Hi Dan,


I tried the following steps
----------------------------

1. Created a replicate volume
2. filled the volume with 100000mu   files
3. Added two more bricks with very less space to make sure out ofspace condition occurs (now volume type is distributed-replicate).4. After i start the rebalance, once the newly added bricks were full,log messages were showing "out of disk space" messages, but migrationdidn't happened.
Now on the mount point i could not see any *.gfs* files, andpermissions for these files were same even after rebalance.
Pleases let me know if i am missing something.


Thanks,
Shylesh

Hello Shylesh- Thanks for trying to reproduce the problem. There areclearly a lot of files in your test volume but you didn't say how bigthey are or if any of them are symbolic links. The volume I have beentesting migrate-data on is 3.6TB in size and is 64% full. There are 22bricks on 10 servers. There are ~45000 symlinks and ~1 million files,varying in size from vary small (~KB) to very large (several GB). Itcontains, among other things, an operating system backup (where a lot ofthe symlinks and small files come from) and ~1.5TB of large NetCDF(http://www.unidata.ucar.edu/software/netcdf/) files. All the servershave bricks belonging to other volumes as well as the volume beingtested. I don't know if any of these volume or file characteristics areresponsible for the problems that occurred during the migrate-dataprocess. It would be relatively easy to have several servers involvedin your test volume and several bricks per server. Perhaps youcould transfer some Linux system files onto it to recreate some of thefeatures of my test volume, and it would also be worth having a range offiles sizes up to ~8GB, like the NetCDF files I referred to.

The volume mount log file on the server that carried out themigrate-data operation contains a lot of errors like the following.


   [2011-12-31 11:11:44.621287] E
   [client3_1-fops.c:2056:client3_1_link_cbk] 0-backup-client-17:
   remote operation failed: File exists
   [2011-12-31 11:11:44.621330] E
   [client3_1-fops.c:2056:client3_1_link_cbk] 0-backup-client-16:
   remote operation failed: File exists
   [2011-12-31 11:11:44.623785] W [fuse-bridge.c:1354:fuse_rename_cbk]
   0-glusterfs-fuse: 46317596:
   
/users/dab/backup/previous_version/data/gorgon/users/dab/Installations/glibc-2.5/sysdeps/unix/sysv/linux/s390/s390-32/.versionsort64.c.gfs56
   ->
   
/users/dab/backup/previous_version/data/gorgon/users/dab/Installations/glibc-2.5/sysdeps/unix/sysv/linux/s390/s390-32/versionsort64.c
   => -1 (File exists)

The brick log files contain some errors relating to *.gfs* files, suchas the following.


   [2011-12-31 11:27:26.348954] I
   [server3_1-fops.c:1050:server_link_cbk] 0-backup-server: 9546975:
   LINK
   
/users/dab/backup/previous_version/data/gorgon/users/dab/Installations/apache-ant-1.7.0/docs/antlibs/.index.html.gfs7312
   (-3867887006) ==> -1 (File exists)
   [2011-12-31 11:27:26.917764] E [posix.c:2213:posix_link]
   0-backup-posix: link
   
/users/dab/backup/previous_version/data/gorgon/users/dab/Installations/apache-ant-1.7.0/bin/.antRun.pl.gfs2199
   to
   
/users/dab/backup/previous_version/data/gorgon/users/dab/Installations/apache-ant-1.7.0/bin/antRun.pl
   failed:File exists
   [2011-12-31 11:27:26.917796] I
   [server3_1-fops.c:1050:server_link_cbk] 0-backup-server: 9547232:
   LINK
   
/users/dab/backup/previous_version/data/gorgon/users/dab/Installations/apache-ant-1.7.0/bin/.antRun.pl.gfs2199
   (-2737554403) ==> -1 (File exists)

   [2011-12-31 12:08:28.88889] E [posix.c:921:posix_setattr]
   0-backup-posix: setattr (lstat) on
   
/mnt/local/glusterfs/users/dab/backup/data/gorgon/users/dab/SGE/src/gridengine/source/dist/mpi/myrinet/.gmps.gfs3139
   failed: No such file or directory
   [2011-12-31 12:08:28.170369] I
   [server3_1-fops.c:1526:server_setattr_cbk] 0-backup-server: 5421009:
   SETATTR
   
/users/dab/backup/data/gorgon/users/dab/SGE/src/gridengine/source/dist/mpi/myrinet/.gmps.gfs3139
   (-783637172) ==> -1 (No such file or directory)

The log files can be downloaded as files bdan0.tar.gz, bdan12.tar.gz,bdan13.tar.gz, bdan14.tar.gz, bdan1.tar.gz, bdan2.tar.gz,bdan3.tar.gz, bdan6.tar.gz, bdan7.tar.gz and perseus.tar.gz fromhttp://www.nerc-essc.ac.uk/~dab.

The migrate-data operation was carried out on a server called perseus.Here is the volume information so you can make sense of the logs.

[root@bdan12 ~]# gluster volume info backup

Volume Name: backup
Type: Distributed-Replicate
Status: Started
Number of Bricks: 11 x 2 = 22
Transport-type: tcp
Bricks:
Brick1: bdan0.nerc-essc.ac.uk:/mnt/local/glusterfs
Brick2: bdan1.nerc-essc.ac.uk:/mnt/local/glusterfs
Brick3: bdan2.nerc-essc.ac.uk:/mnt/local/glusterfs
Brick4: bdan3.nerc-essc.ac.uk:/mnt/local/glusterfs
Brick5: bdan6.nerc-essc.ac.uk:/mnt/local/glusterfs
Brick6: bdan7.nerc-essc.ac.uk:/mnt/local/glusterfs
Brick7: perseus.nerc-essc.ac.uk:/mnt/local/glusterfs
Brick8: perseus.nerc-essc.ac.uk:/mnt/local2/glusterfs
Brick9: perseus.nerc-essc.ac.uk:/backup/glusterfs
Brick10: bdan14.nerc-essc.ac.uk:/backup/glusterfs
Brick11: perseus.nerc-essc.ac.uk:/backup2/glusterfs
Brick12: bdan14.nerc-essc.ac.uk:/backup2/glusterfs
Brick13: bdan12.nerc-essc.ac.uk:/backup2/glusterfs
Brick14: bdan13.nerc-essc.ac.uk:/backup2/glusterfs
Brick15: bdan12.nerc-essc.ac.uk:/backup3/glusterfs
Brick16: bdan13.nerc-essc.ac.uk:/backup3/glusterfs
Brick17: bdan12.nerc-essc.ac.uk:/backup/glusterfs
Brick18: bdan13.nerc-essc.ac.uk:/backup/glusterfs
Brick19: perseus.nerc-essc.ac.uk:/backup3/glusterfs
Brick20: bdan14.nerc-essc.ac.uk:/backup3/glusterfs
Brick21: perseus.nerc-essc.ac.uk:/backup4/glusterfs
Brick22: bdan14.nerc-essc.ac.uk:/backup4/glusterfs
Options Reconfigured:
performance.quick-read: off
performance.cache-refresh-timeout: 0
performance.stat-prefetch: off
cluster.min-free-disk: 34GB
nfs.disable: on

Regards
Dan.

_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Re: [Gluster-users] Vol full of .*.gfs* after migrate-data

Reply via email to

Re: [Gluster-users] Vol full of ..gfs after migrate-data