Hi Alessandro,
what you describe here reminds me of this issue:
http://www.spinics.net/lists/gluster-users/msg20144.html
And now that you mention it, the mess on our cluster could indeed have
been triggered by an aborted rebalance.
This is a very important clue, since apparently developers were never
able to reproduce the issue in the lab. I also tried to reproduce the
issue on a test cluster, but never succeeded.
The example you describe below seems to me relatively easy to fix. A
rebalance fix-layout would eventually get rid of the sticky bit files
(---------T) on your brick 5 and 6 and you could manually remove the
files created on 10/03 as long as you also remove the corresponding link
file in the .glusterfs dir on that brick.
I whole heartedly agree with you that this needs urgent attention of
developers before they start working on new features. A mess like this
in a distributed file system makes the file system unusable for
production. This should never happen, never! And if it does a rebalance
should be able to detect and fix it... fast and efficiently. I also
agree that the status of a rebalance should be more telling, giving a
clear idea how long it would still take to complete. On large clusters a
rebalance often takes ages and makes the entire cluster extremely
vulnerable. (another scary operation is a remove-brick operation, but
this is another story)
What I did in our case, maybe this could help you too as a quick fix for
the most critical directories, is to rsync to a different storage (via a
mount point). rsync only copies one file of duplicated files and you
could separately copy a good version (in the case below e.g.: -rw-r--r--
2 seviri users 68 May 26 2014
/data/glusterfs/home/brick1/seviri/.forward) of the problem files. But
probably, as soon as you remove the files created on 10/03 (incl. the
gluster link file in .glusterfs), the listing via your NFS mount will be
restored. Try this out with a couple of files you have back-upped to be
sure.
Hope this helps!
Cheers,
Olav
On 20/03/15 12:22, Alessandro Ipe wrote:
Hi,
After lauching a "rebalance" on an idle gluster system one week ago,
its status told me it has scanned
more than 23 millions files on each of my 6 bricks. However, without
knowing at least the total files to
be scanned, this status is USELESS from an end-user perspective,
because it does not allow you to
know WHEN the rebalance could eventually complete (one day, one week,
one year or never). From
my point of view, the total files per bricks could be obtained and
maintained when activating quota,
since the whole filesystem has to be crawled...
After one week being offline and still no clue when the rebalance
would complete, I decided to stop it...
Enormous mistake... It seems that rebalance cannot manage to not screw
some files. Example, on
the only client mounting the gluster system, "ls -la /home/seviri" returns
ls: cannot access /home/seviri/.forward: Stale NFS file handle
ls: cannot access /home/seviri/.forward: Stale NFS file handle
-????????? ? ? ? ? ? .forward
-????????? ? ? ? ? ? .forward
while this file could perfectly be accessed before (being rebalanced)
and has not been modifed for at
least 3 years.
Getting the extended attributes on the various bricks 3, 4, 5, 6 (3-4
replicate, 5-6 replicate)
Brick 3:
ls -l /data/glusterfs/home/brick?/seviri/.forward
-rw-r--r-- 2 seviri users 68 May 26 2014
/data/glusterfs/home/brick1/seviri/.forward
-rw-r--r-- 2 seviri users 68 Mar 10 10:22
/data/glusterfs/home/brick2/seviri/.forward
getfattr -d -m . -e hex /data/glusterfs/home/brick?/seviri/.forward
# file: data/glusterfs/home/brick1/seviri/.forward
trusted.afr.home-client-8=0x000000000000000000000000
trusted.afr.home-client-9=0x000000000000000000000000
trusted.gfid=0xc1d268beb17443a39d914de917de123a
# file: data/glusterfs/home/brick2/seviri/.forward
trusted.afr.home-client-10=0x000000000000000000000000
trusted.afr.home-client-11=0x000000000000000000000000
trusted.gfid=0x14a1c10eb1474ef2bf72f4c6c64a90ce
trusted.glusterfs.quota.4138a9fa-a453-4b8e-905a-e02cce07d717.contri=0x0000000000000200
trusted.pgfid.4138a9fa-a453-4b8e-905a-e02cce07d717=0x00000001
Brick 4:
ls -l /data/glusterfs/home/brick?/seviri/.forward
-rw-r--r-- 2 seviri users 68 May 26 2014
/data/glusterfs/home/brick1/seviri/.forward
-rw-r--r-- 2 seviri users 68 Mar 10 10:22
/data/glusterfs/home/brick2/seviri/.forward
getfattr -d -m . -e hex /data/glusterfs/home/brick?/seviri/.forward
# file: data/glusterfs/home/brick1/seviri/.forward
trusted.afr.home-client-8=0x000000000000000000000000
trusted.afr.home-client-9=0x000000000000000000000000
trusted.gfid=0xc1d268beb17443a39d914de917de123a
# file: data/glusterfs/home/brick2/seviri/.forward
trusted.afr.home-client-10=0x000000000000000000000000
trusted.afr.home-client-11=0x000000000000000000000000
trusted.gfid=0x14a1c10eb1474ef2bf72f4c6c64a90ce
trusted.glusterfs.quota.4138a9fa-a453-4b8e-905a-e02cce07d717.contri=0x0000000000000200
trusted.pgfid.4138a9fa-a453-4b8e-905a-e02cce07d717=0x00000001
Brick 5:
ls -l /data/glusterfs/home/brick?/seviri/.forward
---------T 2 root root 0 Mar 18 08:19
/data/glusterfs/home/brick2/seviri/.forward
getfattr -d -m . -e hex /data/glusterfs/home/brick?/seviri/.forward
# file: data/glusterfs/home/brick2/seviri/.forward
trusted.gfid=0x14a1c10eb1474ef2bf72f4c6c64a90ce
trusted.glusterfs.dht.linkto=0x686f6d652d7265706c69636174652d3400
Brick 6:
ls -l /data/glusterfs/home/brick?/seviri/.forward
---------T 2 root root 0 Mar 18 08:19
/data/glusterfs/home/brick2/seviri/.forward
getfattr -d -m . -e hex /data/glusterfs/home/brick?/seviri/.forward
# file: data/glusterfs/home/brick2/seviri/.forward
trusted.gfid=0x14a1c10eb1474ef2bf72f4c6c64a90ce
trusted.glusterfs.dht.linkto=0x686f6d652d7265706c69636174652d3400
Looking at the results from bricks 3 & 4 shows something weird. The
file exists on 2 sub-bricks
storage directories, while it should only be found once on each brick
server. Or is the issue lying in the
results of bricks 5 & 6 ? How can I fix this, please ? By the way, the
split-brain tutorial only covers
BASIC split-brain conditions and not complex (real life) cases like
this one. It would definitely benefit if
enriched by this one.
More generally, I think the concept of gluster is promising, but if
basic commands (rebalance,
absolutely needed after adding more storage) from its own cli allows
to put the system into an
unstable state, I am really starting to question its ability to be
used in a production environment. And
from an end-user perspective, I do not care about new features added,
no matter how appealing they
could be, if the basic ones are not almost totally reliable. Finally,
testing gluster under high load on the
brick servers (real world conditions) would certainly gives insight to
the developpers on what it failing
and what needs therefore to be fixed to mitigate this and improve
gluster reliability.
Forgive my harsh words/criticisms, but having to struggle with gluster
issues for two weeks now is
getting on my nerves since my colleagues can not use the data stored
on it and I do not see any time
from now when it will be back online.
Regards,
Alessandro.
_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users