Hello,
@Krutika: Thanks for transferring my issue.
Everything is becoming completely crazy; other quotas are exploding. Indeed,
after having remove my previous quota in failure, some other quotas have grown
up as you can read below:
[root@lucifer ~]# gluster volume quota vol_home list
Path Hard-limit Soft-limit Used Available
--------------------------------------------------------------------------------
/baaden_team 20.0TB 90% 15.1TB 4.9TB
/sterpone_team 14.0TB 90% 25.5TB 0Bytes
/simlab_team 5.0TB 90% 1.3TB 3.7TB
/sacquin_team 10.0TB 90% 8.3TB 1.7TB
/admin_team 1.0TB 90% 17.0GB 1007.0GB
/amyloid_team 7.0TB 90% 6.4TB 577.5GB
/amyloid_team/nguyen 4.0TB 90% 3.7TB 312.7GB
[root@lucifer ~]# pdsh -w cl-storage[1,3] du -sh
/export/brick_home/brick*/sterpone_team
cl-storage1: 3,1T /export/brick_home/brick1/sterpone_team
cl-storage1: 2,3T /export/brick_home/brick2/sterpone_team
cl-storage3: 2,7T /export/brick_home/brick1/sterpone_team
cl-storage3: 2,9T /export/brick_home/brick2/sterpone_team
=> ~11TB (not 25.5TB!!!)
[root@lucifer ~]# pdsh -w cl-storage[1,3] du -sh
/export/brick_home/brick*/baaden_team
cl-storage1: 4,2T /export/brick_home/brick1/baaden_team
cl-storage3: 3,7T /export/brick_home/brick1/baaden_team
cl-storage1: 3,6T /export/brick_home/brick2/baaden_team
cl-storage3: 3,5T /export/brick_home/brick2/baaden_team
=> ~15TB (not 14TB).
Etc.
Do you please help me to urgently solve this issue because this situation is
blocking and I must stop the production until.
Do you think upgrading storage cluster into 3.7.1 (the latest) version of
GlusterFS could fix the problem?
Thanks by advance,
Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: [email protected]
Le 27 juin 2015 à 08:13, Krutika Dhananjay <[email protected]> a écrit :
> Copying Vijai and Raghavendra for help...
>
> -Krutika
> From: "Geoffrey Letessier" <[email protected]>
> To: "Krutika Dhananjay" <[email protected]>
> Sent: Saturday, June 27, 2015 2:13:52 AM
> Subject: Re: [Gluster-users] GlusterFS 3.5.3 - untar: very poor performance
>
> Hi Krutika,
>
> Since I have re-enabled the quota feature on my volume vol_home, one defined
> quota is become like crazy… And it’s a very very very big problem for us.
>
> During all the day, after having re-enabled it, i noted the used space
> detected growing up (without any user IO on)..
>
> [root@lucifer ~]# gluster volume quota vol_home list|grep derreumaux_team
> /derreumaux_team 14.0TB 80% 13.7TB 357.2GB
> [root@lucifer ~]# gluster volume quota vol_home list /derreumaux_team
> Path Hard-limit Soft-limit Used
> Available
> --------------------------------------------------------------------------------
> /derreumaux_team 14.0TB 80% 13.1TB 874.1GB
> [root@lucifer ~]# pdsh -w cl-storage[1,3] du -sh
> /export/brick_home/brick*/derreumaux_team
> cl-storage3: 590G /export/brick_home/brick1/derreumaux_team
> cl-storage3: 611G /export/brick_home/brick2/derreumaux_team
> cl-storage1: 567G /export/brick_home/brick1/derreumaux_team
> cl-storage1: 564G /export/brick_home/brick2/derreumaux_team
>
> As you can see in these 3 command lines, i obtain 3 different results but,
> the worse, it’s quota system est very very far from the real disk used space
> (13.7TB <> 13.1TB <<>> 2.3TB).
>
> Can you please help to fix it very quickly because all this group is
> completely block by exceeded quota.
>
> Thank you so much by advance,
> Have a nice week-end,
> Geoffrey
> ------------------------------------------------------
> Geoffrey Letessier
> Responsable informatique & ingénieur système
> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: [email protected]
>
> Le 26 juin 2015 à 10:29, Krutika Dhananjay <[email protected]> a écrit :
>
> No but if you are saying it is 3.5.3 rpm version, then that bug does not
> exist there.
> And still it is weird how you are seeing such bad performance. :-/
> Anything suspicious in the logs?
>
> -Krutika
> From: "Geoffrey Letessier" <[email protected]>
> To: "Krutika Dhananjay" <[email protected]>
> Sent: Friday, June 26, 2015 1:27:16 PM
> Subject: Re: [Gluster-users] GlusterFS 3.5.3 - untar: very poor performance
>
> No , it’s the 3.5.3 RPMS version if found on your reposity (published on
> novembre 2014).
> So, you suggest me to simply upgrade all servers and clients with the new
> 3.5.4 version? Wouldn't it be better to upgrade all the system (servers and
> clients) to the 3.7.1 version?
>
> Geoffrey
> ------------------------------------------------------
> Geoffrey Letessier
> Responsable informatique & ingénieur système
> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: [email protected]
>
> Le 26 juin 2015 à 09:03, Krutika Dhananjay <[email protected]> a écrit :
>
> Also, so are you running 3.5.3 rpms on the clients? Or is it a patched
> version with more fixes on top of 3.5.3?
> The reason I ask this is because there was one performance issue introduced
> after 3.5.3 and fixed by 3.5.4 in replication module. I'm wondering if that
> could be causing the issue you experience.
>
> -Krutika
> From: "Geoffrey Letessier" <[email protected]>
> To: "Krutika Dhananjay" <[email protected]>
> Sent: Friday, June 26, 2015 10:05:26 AM
> Subject: Re: [Gluster-users] GlusterFS 3.5.3 - untar: very poor performance
>
> Hi Krutika,
>
> Oops, I disable quota manager without saving configuration. Could you tell me
> how to retrieve quota list information?
>
> I’m gonna test the untar in the meantime.
>
> Geoffrey
> ------------------------------------------------------
> Geoffrey Letessier
> Responsable informatique & ingénieur système
> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: [email protected]
>
> Le 26 juin 2015 à 04:56, Krutika Dhananjay <[email protected]> a écrit :
>
> Hi,
>
> So i tried out kernel src tree untar locally on a plain replicate (1x2)
> volume and it took me 7m30sec on an average. This was on vms and there was no
> rdma and there was no quota enabled.
> Could you try the same thing on a volume without quota to see if it makes a
> difference to the perf?
>
> -Krutika
>
> From: "Geoffrey Letessier" <[email protected]>
> To: "Krutika Dhananjay" <[email protected]>
> Sent: Wednesday, June 24, 2015 10:21:13 AM
> Subject: Re: [Gluster-users] GlusterFS 3.5.3 - untar: very poor performance
>
> Hi Krutika,
>
> OK, thank you very much by advance.
> Concerning quota system, are you in touch with Vijaykumar? Because I’m still
> waiting for a answer since a couple of days, nay more.
>
> One more time, thank you.
> Have a nice day (in France it’s 6:50 AM).
> Geoffrey
> -----------------------------------------------
> Geoffrey Letessier
>
> Responsable informatique & ingénieur système
> CNRS - UPR 9080 - Laboratoire de Biochimie Théorique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: [email protected]
>
> Le 24 juin 2015 à 05:55, Krutika Dhananjay <[email protected]> a écrit :
>
> Ok so for anything related to replication, I could help you out.
> But for quota, it would be better to ask Vijaikumar Mallikarjuna or
> Raghavendra G on the mailing list.
> I used to work on quota, long time back. But now I am not in touch with the
> component anymore and do not know of the latest changes to it.
> For the performance issue, I will try linux kernel src untar on my machines
> and let you know what I find.
>
> -Krutika
>
> From: "Geoffrey Letessier" <[email protected]>
> To: "Krutika Dhananjay" <[email protected]>
> Sent: Monday, June 22, 2015 9:00:52 PM
> Subject: Re: [Gluster-users] GlusterFS 3.5.3 - untar: very poor performance
>
> Hi Krutika,
>
> Sorry for the delay but i was in meeting all the day.
>
> Good to hear from you as well. :)
> ;-)
> So you are seeing this bad performance only in 3.5.3? Any other releases you
> tried this test on, where the results were much better with replication?
> Yes but I’m not sure my issue is only concerning this specific release. A few
> days ago, the untar process (with the same version of GlusterFS) took around
> 8 minutes, now around 32 minutes. 8 was too much but what about 32 minutes? :)
>
> That said, my matter is only concerning small files because if i play with dd
> (or other) with only 1 big file all is OK (client write throughput: ~1GBs =>
> ~500MBs in each replica)
>
> If i run my bench on my only distributed volume i get a good performance
> (untar: ~1m44s, etc.)..
>
> In addition, i dunno if it can be important, I have some troubles with
> GlusterFS group quota: there are a lot of conflicts between quota size and
> actual file size which dont match and a lot of "quota xattrs not found"
> messages with quota-verify glusterfs app. -you can find in attachment an
> extract of quota-verify outputs.
>
> If so, could you please let me know? Meanwhile let me try the untar myself on
> my vms to see what could be causing the perf issue.
> OK, thanks.
>
> See you,
> Geoffrey
> ------------------------------------------------------
> Geoffrey Letessier
> Responsable informatique & ingénieur système
> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: [email protected]
>
> Le 22 juin 2015 à 11:35, Krutika Dhananjay <[email protected]> a écrit :
>
> Hi Geoffrey,
>
> Good to hear from you as well. :)
> Ok so you say disabling write-behind does not help. Makes me wonder what the
> problem could be.
> So you are seeing this bad performance only in 3.5.3? Any other releases you
> tried this test on, where the results were much better with replication?
> If so, could you please let me know? Meanwhile let me try the untar myself on
> my vms to see what could be causing the perf issue.
>
> -Krutika
>
> From: "Geoffrey Letessier" <[email protected]>
> To: "Krutika Dhananjay" <[email protected]>
> Sent: Monday, June 22, 2015 10:14:26 AM
> Subject: Re: [Gluster-users] GlusterFS 3.5.3 - untar: very poor performance
>
> Hi Krutika,
>
> It’s good to read you again :)
>
> Here are my answers:
> 1- could you remind me how to know if self-heal is currently in progress? I
> dont note any special neither mount-point (except /var/run/gluster/vol_home
> one) nor dedicated process; but maybe i look in the wrong place..
> 2- OK, I just disabled write-behind parameter and rerun the bench. I’ll let
> you know more about when I will arrive at my office (I’m still at home at
> this time).
>
> See you and thanks you for helping.
> Geoffrey
> -----------------------------------------------
> Geoffrey Letessier
>
> Responsable informatique & ingénieur système
> CNRS - UPR 9080 - Laboratoire de Biochimie Théorique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: [email protected]
>
> Le 22 juin 2015 à 04:35, Krutika Dhananjay <[email protected]> a écrit :
>
> Hi Geoffrey,
>
> 1. Was self-heal also in progress while I/O was happening on the volume?
> 2. Also, there seem to be quite a few fsyncs which could possibly have slowed
> things down a bit. Could you disable write-behind and try
> getting the time stats one more time to eliminate the possibility of
> write-behind's presence causing out-of-order writes to increase the number of
> fsyncs
> by the replication module.
>
> -Krutika
> From: "Geoffrey Letessier" <[email protected]>
> To: [email protected]
> Sent: Saturday, June 20, 2015 6:04:40 AM
> Subject: Re: [Gluster-users] GlusterFS 3.5.3 - untar: very poor performance
>
> Re,
>
> For comparison, here is the output of the same script run on a distributed
> only volume (2 servers of the 4 previously described, 2 bricks each):
> #######################################################
> ################ UNTAR time consumed ################
> #######################################################
>
>
> real 1m44.698s
> user 0m8.891s
> sys 0m8.353s
>
> #######################################################
> ################# DU time consumed ##################
> #######################################################
>
> 554M linux-4.1-rc6
>
> real 0m21.062s
> user 0m0.100s
> sys 0m1.040s
>
> #######################################################
> ################# FIND time consumed ################
> #######################################################
>
> 52663
>
> real 0m21.325s
> user 0m0.104s
> sys 0m1.054s
>
> #######################################################
> ################# GREP time consumed ################
> #######################################################
>
> 7952
>
> real 0m43.618s
> user 0m0.922s
> sys 0m3.626s
>
> #######################################################
> ################# TAR time consumed #################
> #######################################################
>
>
> real 0m50.577s
> user 0m29.745s
> sys 0m4.086s
>
> #######################################################
> ################# RM time consumed ##################
> #######################################################
>
>
> real 0m41.133s
> user 0m0.171s
> sys 0m2.522s
>
> The performances are amazing different!
>
> Geoffrey
> -----------------------------------------------
> Geoffrey Letessier
>
> Responsable informatique & ingénieur système
> CNRS - UPR 9080 - Laboratoire de Biochimie Théorique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: [email protected]
>
> Le 20 juin 2015 à 02:12, Geoffrey Letessier <[email protected]> a
> écrit :
>
> Dear all,
>
> I just noticed on my main volume of my HPC cluster my IO operations become
> impressively poor..
>
> Doing some file operations above a linux kernel sources compressed file, the
> untar operation can take more than 1/2 hours for this file (roughly 80MB and
> 52 000 files inside) as you read below:
> #######################################################
> ################ UNTAR time consumed ################
> #######################################################
>
>
> real 32m42.967s
> user 0m11.783s
> sys 0m15.050s
>
> #######################################################
> ################# DU time consumed ##################
> #######################################################
>
> 557M linux-4.1-rc6
>
> real 0m25.060s
> user 0m0.068s
> sys 0m0.344s
>
> #######################################################
> ################# FIND time consumed ################
> #######################################################
>
> 52663
>
> real 0m25.687s
> user 0m0.084s
> sys 0m0.387s
>
> #######################################################
> ################# GREP time consumed ################
> #######################################################
>
> 7952
>
> real 2m15.890s
> user 0m0.887s
> sys 0m2.777s
>
> #######################################################
> ################# TAR time consumed #################
> #######################################################
>
>
> real 1m5.551s
> user 0m26.536s
> sys 0m2.609s
>
> #######################################################
> ################# RM time consumed ##################
> #######################################################
>
>
> real 2m51.485s
> user 0m0.167s
> sys 0m1.663s
>
> For information, this volume is a distributed replicated one and is composed
> by 4 servers with 2 bricks each. Each bricks is a 12-drives RAID6 vdisk with
> nice native performances (around 1.2GBs).
>
> In comparison, when I use DD to generate a 100GB file on the same volume, my
> write throughput is around 1GB (client side) and 500MBs (server side) because
> of replication:
> Client side:
> [root@node056 ~]# ifstat -i ib0
> ib0
> KB/s in KB/s out
> 3251.45 1.09e+06
> 3139.80 1.05e+06
> 3185.29 1.06e+06
> 3293.84 1.09e+06
> ...
>
> Server side:
> [root@lucifer ~]# ifstat -i ib0
> ib0
> KB/s in KB/s out
> 561818.1 1746.42
> 560020.3 1737.92
> 526337.1 1648.20
> 513972.7 1613.69
> ...
>
> DD command:
> [root@node056 ~]# dd if=/dev/zero of=/home/root/test.dd bs=1M count=100000
> 100000+0 enregistrements lus
> 100000+0 enregistrements écrits
> 104857600000 octets (105 GB) copiés, 202,99 s, 517 MB/s
>
> So this issue doesn’t seem coming from the network (which is Infiniband
> technology in this case)
>
> You can find in attachments a set of files:
> - mybench.sh: the bench script
> - benches.txt: output of my "bench"
> - profile.txt: gluster volume profile during the "bench"
> - vol_status.txt: gluster volume status
> - vol_info.txt: gluster volume info
>
> Can someone help me to fix it (it’s very critical because this volume is on a
> HPC cluster in production).
>
> Thanks by advance,
> Geoffrey
> -----------------------------------------------
> Geoffrey Letessier
>
> Responsable informatique & ingénieur système
> CNRS - UPR 9080 - Laboratoire de Biochimie Théorique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: [email protected]
> <benches.txt>
>
> <mybench.sh>
>
> <profile.txt>
>
> <vol_info.txt>
>
> <vol_status.txt>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> [email protected]
> http://www.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
>
>
>
>
>
>
>
>
>
>
_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users