Re: [Gluster-users] Is rebalance completely broken on 3.5.3 ?

Olav Peeters Fri, 20 Mar 2015 10:57:14 -0700

Hi Alessandro,
what you describe here reminds me of this issue:
http://www.spinics.net/lists/gluster-users/msg20144.html

And now that you mention it, the mess on our cluster could indeed havebeen triggered by an aborted rebalance.This is a very important clue, since apparently developers were neverable to reproduce the issue in the lab. I also tried to reproduce theissue on a test cluster, but never succeeded.

The example you describe below seems to me relatively easy to fix. Arebalance fix-layout would eventually get rid of the sticky bit files(---------T) on your brick 5 and 6 and you could manually remove thefiles created on 10/03 as long as you also remove the corresponding linkfile in the .glusterfs dir on that brick.

I whole heartedly agree with you that this needs urgent attention ofdevelopers before they start working on new features. A mess like thisin a distributed file system makes the file system unusable forproduction. This should never happen, never! And if it does a rebalanceshould be able to detect and fix it... fast and efficiently. I alsoagree that the status of a rebalance should be more telling, giving aclear idea how long it would still take to complete. On large clusters arebalance often takes ages and makes the entire cluster extremelyvulnerable. (another scary operation is a remove-brick operation, butthis is another story)

What I did in our case, maybe this could help you too as a quick fix forthe most critical directories, is to rsync to a different storage (via amount point). rsync only copies one file of duplicated files and youcould separately copy a good version (in the case below e.g.: -rw-r--r--2 seviri users 68 May 26 2014/data/glusterfs/home/brick1/seviri/.forward) of the problem files. Butprobably, as soon as you remove the files created on 10/03 (incl. thegluster link file in .glusterfs), the listing via your NFS mount will berestored. Try this out with a couple of files you have back-upped to besure.


Hope this helps!

Cheers,
Olav





On 20/03/15 12:22, Alessandro Ipe wrote:

Hi,
After lauching a "rebalance" on an idle gluster system one week ago,its status told me it has scanned
more than 23 millions files on each of my 6 bricks. However, withoutknowing at least the total files to
be scanned, this status is USELESS from an end-user perspective,because it does not allow you to
know WHEN the rebalance could eventually complete (one day, one week,one year or never). From
my point of view, the total files per bricks could be obtained andmaintained when activating quota,
since the whole filesystem has to be crawled...
After one week being offline and still no clue when the rebalancewould complete, I decided to stop it...
Enormous mistake... It seems that rebalance cannot manage to not screwsome files. Example, on
the only client mounting the gluster system, "ls -la /home/seviri" returns

ls: cannot access /home/seviri/.forward: Stale NFS file handle

ls: cannot access /home/seviri/.forward: Stale NFS file handle

-????????? ? ? ? ? ? .forward

-????????? ? ? ? ? ? .forward
while this file could perfectly be accessed before (being rebalanced)and has not been modifed for at
least 3 years.
Getting the extended attributes on the various bricks 3, 4, 5, 6 (3-4replicate, 5-6 replicate)
Brick 3:

ls -l /data/glusterfs/home/brick?/seviri/.forward
-rw-r--r-- 2 seviri users 68 May 26 2014/data/glusterfs/home/brick1/seviri/.forward
-rw-r--r-- 2 seviri users 68 Mar 10 10:22/data/glusterfs/home/brick2/seviri/.forward
getfattr -d -m . -e hex /data/glusterfs/home/brick?/seviri/.forward

# file: data/glusterfs/home/brick1/seviri/.forward

trusted.afr.home-client-8=0x000000000000000000000000

trusted.afr.home-client-9=0x000000000000000000000000

trusted.gfid=0xc1d268beb17443a39d914de917de123a

# file: data/glusterfs/home/brick2/seviri/.forward

trusted.afr.home-client-10=0x000000000000000000000000

trusted.afr.home-client-11=0x000000000000000000000000

trusted.gfid=0x14a1c10eb1474ef2bf72f4c6c64a90ce

trusted.glusterfs.quota.4138a9fa-a453-4b8e-905a-e02cce07d717.contri=0x0000000000000200

trusted.pgfid.4138a9fa-a453-4b8e-905a-e02cce07d717=0x00000001

Brick 4:

ls -l /data/glusterfs/home/brick?/seviri/.forward
-rw-r--r-- 2 seviri users 68 May 26 2014/data/glusterfs/home/brick1/seviri/.forward
-rw-r--r-- 2 seviri users 68 Mar 10 10:22/data/glusterfs/home/brick2/seviri/.forward
getfattr -d -m . -e hex /data/glusterfs/home/brick?/seviri/.forward

# file: data/glusterfs/home/brick1/seviri/.forward

trusted.afr.home-client-8=0x000000000000000000000000

trusted.afr.home-client-9=0x000000000000000000000000

trusted.gfid=0xc1d268beb17443a39d914de917de123a

# file: data/glusterfs/home/brick2/seviri/.forward

trusted.afr.home-client-10=0x000000000000000000000000

trusted.afr.home-client-11=0x000000000000000000000000

trusted.gfid=0x14a1c10eb1474ef2bf72f4c6c64a90ce

trusted.glusterfs.quota.4138a9fa-a453-4b8e-905a-e02cce07d717.contri=0x0000000000000200

trusted.pgfid.4138a9fa-a453-4b8e-905a-e02cce07d717=0x00000001

Brick 5:

ls -l /data/glusterfs/home/brick?/seviri/.forward
---------T 2 root root 0 Mar 18 08:19/data/glusterfs/home/brick2/seviri/.forward
getfattr -d -m . -e hex /data/glusterfs/home/brick?/seviri/.forward

# file: data/glusterfs/home/brick2/seviri/.forward

trusted.gfid=0x14a1c10eb1474ef2bf72f4c6c64a90ce

trusted.glusterfs.dht.linkto=0x686f6d652d7265706c69636174652d3400

Brick 6:

ls -l /data/glusterfs/home/brick?/seviri/.forward
---------T 2 root root 0 Mar 18 08:19/data/glusterfs/home/brick2/seviri/.forward
getfattr -d -m . -e hex /data/glusterfs/home/brick?/seviri/.forward

# file: data/glusterfs/home/brick2/seviri/.forward

trusted.gfid=0x14a1c10eb1474ef2bf72f4c6c64a90ce

trusted.glusterfs.dht.linkto=0x686f6d652d7265706c69636174652d3400
Looking at the results from bricks 3 & 4 shows something weird. Thefile exists on 2 sub-bricks
storage directories, while it should only be found once on each brickserver. Or is the issue lying in the
results of bricks 5 & 6 ? How can I fix this, please ? By the way, thesplit-brain tutorial only covers
BASIC split-brain conditions and not complex (real life) cases likethis one. It would definitely benefit if
enriched by this one.
More generally, I think the concept of gluster is promising, but ifbasic commands (rebalance,
absolutely needed after adding more storage) from its own cli allowsto put the system into an
unstable state, I am really starting to question its ability to beused in a production environment. And
from an end-user perspective, I do not care about new features added,no matter how appealing they
could be, if the basic ones are not almost totally reliable. Finally,testing gluster under high load on the
brick servers (real world conditions) would certainly gives insight tothe developpers on what it failing
and what needs therefore to be fixed to mitigate this and improvegluster reliability.
Forgive my harsh words/criticisms, but having to struggle with glusterissues for two weeks now is
getting on my nerves since my colleagues can not use the data storedon it and I do not see any time
from now when it will be back online.

Regards,

Alessandro.



_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Is rebalance completely broken on 3.5.3 ?

Reply via email to