Re: [Gluster-users] GlusterFS 3.5.3 - untar: very poor performance

Geoffrey Letessier Sun, 28 Jun 2015 01:05:13 -0700

Hello,

@Krutika: Thanks for transferring my issue.


Everything is becoming completely crazy; other quotas are exploding. Indeed, 
after having remove my previous quota in failure, some other quotas have grown 
up as you can read below:

[root@lucifer ~]# gluster volume quota vol_home list
                  Path                   Hard-limit Soft-limit   Used  Available
--------------------------------------------------------------------------------
/baaden_team                              20.0TB       90%      15.1TB   4.9TB
/sterpone_team                            14.0TB       90%      25.5TB  0Bytes
/simlab_team                               5.0TB       90%       1.3TB   3.7TB
/sacquin_team                             10.0TB       90%       8.3TB   1.7TB
/admin_team                                1.0TB       90%      17.0GB 1007.0GB
/amyloid_team                              7.0TB       90%       6.4TB 577.5GB
/amyloid_team/nguyen                       4.0TB       90%       3.7TB 312.7GB


[root@lucifer ~]# pdsh -w cl-storage[1,3] du -sh 
/export/brick_home/brick*/sterpone_team
cl-storage1: 3,1T       /export/brick_home/brick1/sterpone_team
cl-storage1: 2,3T       /export/brick_home/brick2/sterpone_team
cl-storage3: 2,7T       /export/brick_home/brick1/sterpone_team
cl-storage3: 2,9T       /export/brick_home/brick2/sterpone_team
=> ~11TB (not 25.5TB!!!)


[root@lucifer ~]# pdsh -w cl-storage[1,3] du -sh 
/export/brick_home/brick*/baaden_team
cl-storage1: 4,2T       /export/brick_home/brick1/baaden_team
cl-storage3: 3,7T       /export/brick_home/brick1/baaden_team
cl-storage1: 3,6T       /export/brick_home/brick2/baaden_team
cl-storage3: 3,5T       /export/brick_home/brick2/baaden_team
=> ~15TB (not 14TB).

Etc.

Do you please help me to urgently solve this issue because this situation is 
blocking and I must stop the production until.

Do you think upgrading storage cluster into 3.7.1 (the latest) version of 
GlusterFS could fix the problem?

Thanks by advance,
Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: [email protected]

Le 27 juin 2015 à 08:13, Krutika Dhananjay <[email protected]> a écrit :

> Copying Vijai and Raghavendra for help...
> 
> -Krutika
> From: "Geoffrey Letessier" <[email protected]>
> To: "Krutika Dhananjay" <[email protected]>
> Sent: Saturday, June 27, 2015 2:13:52 AM
> Subject: Re: [Gluster-users] GlusterFS 3.5.3 - untar: very poor performance
> 
> Hi Krutika,
> 
> Since I have re-enabled the quota feature on my volume vol_home, one defined 
> quota is become like crazy… And it’s a very very very big problem for us.
> 
> During all the day, after having re-enabled it, i noted the used space 
> detected growing up (without any user IO on)..
> 
> [root@lucifer ~]# gluster volume quota vol_home list|grep derreumaux_team
> /derreumaux_team                          14.0TB       80%      13.7TB 357.2GB
> [root@lucifer ~]# gluster volume quota vol_home list /derreumaux_team
>                   Path                   Hard-limit Soft-limit   Used  
> Available
> --------------------------------------------------------------------------------
> /derreumaux_team                          14.0TB       80%      13.1TB 874.1GB
> [root@lucifer ~]# pdsh -w cl-storage[1,3] du -sh 
> /export/brick_home/brick*/derreumaux_team
> cl-storage3: 590G /export/brick_home/brick1/derreumaux_team
> cl-storage3: 611G /export/brick_home/brick2/derreumaux_team
> cl-storage1: 567G /export/brick_home/brick1/derreumaux_team
> cl-storage1: 564G /export/brick_home/brick2/derreumaux_team
> 
> As you can see in these 3 command lines, i obtain 3 different results but, 
> the worse, it’s quota system est very very far from the real disk used space 
> (13.7TB <> 13.1TB <<>> 2.3TB).
> 
> Can you please help to fix it very quickly because all this group is 
> completely block by exceeded quota.
> 
> Thank you so much by advance,
> Have a nice week-end,
> Geoffrey
> ------------------------------------------------------
> Geoffrey Letessier
> Responsable informatique & ingénieur système
> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: [email protected]
> 
> Le 26 juin 2015 à 10:29, Krutika Dhananjay <[email protected]> a écrit :
> 
> No but if you are saying it is 3.5.3 rpm version, then that bug does not 
> exist there.
> And still it is weird how you are seeing such bad performance. :-/
> Anything suspicious in the logs?
> 
> -Krutika
> From: "Geoffrey Letessier" <[email protected]>
> To: "Krutika Dhananjay" <[email protected]>
> Sent: Friday, June 26, 2015 1:27:16 PM
> Subject: Re: [Gluster-users] GlusterFS 3.5.3 - untar: very poor performance
> 
> No , it’s the 3.5.3 RPMS version if found on your reposity (published on 
> novembre 2014).
> So, you suggest me to simply upgrade all servers and clients with the new 
> 3.5.4 version? Wouldn't it be better to upgrade all the system (servers and 
> clients) to the 3.7.1 version?
> 
> Geoffrey
> ------------------------------------------------------
> Geoffrey Letessier
> Responsable informatique & ingénieur système
> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: [email protected]
> 
> Le 26 juin 2015 à 09:03, Krutika Dhananjay <[email protected]> a écrit :
> 
> Also, so are you running 3.5.3 rpms on the clients? Or is it a patched 
> version with more fixes on top of 3.5.3?
> The reason I ask this is because there was one performance issue introduced 
> after 3.5.3 and fixed by 3.5.4 in replication module. I'm wondering if that 
> could be causing the issue you experience.
> 
> -Krutika
> From: "Geoffrey Letessier" <[email protected]>
> To: "Krutika Dhananjay" <[email protected]>
> Sent: Friday, June 26, 2015 10:05:26 AM
> Subject: Re: [Gluster-users] GlusterFS 3.5.3 - untar: very poor performance
> 
> Hi Krutika,
> 
> Oops, I disable quota manager without saving configuration. Could you tell me 
> how to retrieve quota list information?
> 
> I’m gonna test the untar in the meantime.
> 
> Geoffrey
> ------------------------------------------------------
> Geoffrey Letessier
> Responsable informatique & ingénieur système
> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: [email protected]
> 
> Le 26 juin 2015 à 04:56, Krutika Dhananjay <[email protected]> a écrit :
> 
> Hi,
> 
> So i tried out kernel src tree untar locally on a plain replicate (1x2) 
> volume and it took me 7m30sec on an average. This was on vms and there was no 
> rdma and there was no quota enabled.
> Could you try the same thing on a volume without quota to see if it makes a 
> difference to the perf?
> 
> -Krutika
> 
> From: "Geoffrey Letessier" <[email protected]>
> To: "Krutika Dhananjay" <[email protected]>
> Sent: Wednesday, June 24, 2015 10:21:13 AM
> Subject: Re: [Gluster-users] GlusterFS 3.5.3 - untar: very poor performance
> 
> Hi Krutika,
> 
> OK, thank you very much by advance.
> Concerning quota system, are you in touch with Vijaykumar? Because I’m still 
> waiting for a answer since a couple of days, nay more.
> 
> One more time, thank you.
> Have a nice day (in France it’s 6:50 AM).
> Geoffrey
> -----------------------------------------------
> Geoffrey Letessier
> 
> Responsable informatique & ingénieur système
> CNRS - UPR 9080 - Laboratoire de Biochimie Théorique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: [email protected]
> 
> Le 24 juin 2015 à 05:55, Krutika Dhananjay <[email protected]> a écrit :
> 
> Ok so for anything related to replication, I could help you out.
> But for quota, it would be better to ask Vijaikumar Mallikarjuna or 
> Raghavendra G on the mailing list.
> I used to work on quota, long time back. But now I am not in touch with the 
> component anymore and do not know of the latest changes to it.
> For the performance issue, I will try linux kernel src untar on my machines 
> and let you know what I find.
> 
> -Krutika
> 
> From: "Geoffrey Letessier" <[email protected]>
> To: "Krutika Dhananjay" <[email protected]>
> Sent: Monday, June 22, 2015 9:00:52 PM
> Subject: Re: [Gluster-users] GlusterFS 3.5.3 - untar: very poor performance
> 
> Hi Krutika,
> 
> Sorry for the delay but i was in meeting all the day. 
> 
> Good to hear from you as well. :)
> ;-)
> So you are seeing this bad performance only in 3.5.3? Any other releases you 
> tried this test on, where the results were much better with replication?
> Yes but I’m not sure my issue is only concerning this specific release. A few 
> days ago, the untar process (with the same version of GlusterFS) took around 
> 8 minutes, now around 32 minutes. 8 was too much but what about 32 minutes? :)
> 
> That said, my matter is only concerning small files because if i play with dd 
> (or other) with only 1 big file all is OK (client write throughput: ~1GBs => 
> ~500MBs in each replica)
> 
> If i run my bench on my only distributed volume i get a good performance 
> (untar: ~1m44s, etc.)..
> 
> In addition, i dunno if it can be important, I have some troubles with 
> GlusterFS group quota: there are a lot of conflicts between quota size and 
> actual file size which dont match and a lot of "quota xattrs not found" 
> messages with quota-verify glusterfs app. -you can find in attachment an 
> extract of quota-verify outputs. 
> 
> If so, could you please let me know? Meanwhile let me try the untar myself on 
> my vms to see what could be causing the perf issue.
> OK, thanks. 
> 
> See you,
> Geoffrey
> ------------------------------------------------------
> Geoffrey Letessier
> Responsable informatique & ingénieur système
> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: [email protected]
> 
> Le 22 juin 2015 à 11:35, Krutika Dhananjay <[email protected]> a écrit :
> 
> Hi Geoffrey,
> 
> Good to hear from you as well. :)
> Ok so you say disabling write-behind does not help. Makes me wonder what the 
> problem could be.
> So you are seeing this bad performance only in 3.5.3? Any other releases you 
> tried this test on, where the results were much better with replication?
> If so, could you please let me know? Meanwhile let me try the untar myself on 
> my vms to see what could be causing the perf issue.
> 
> -Krutika
> 
> From: "Geoffrey Letessier" <[email protected]>
> To: "Krutika Dhananjay" <[email protected]>
> Sent: Monday, June 22, 2015 10:14:26 AM
> Subject: Re: [Gluster-users] GlusterFS 3.5.3 - untar: very poor performance
> 
> Hi Krutika,
> 
> It’s good to read you again :)
> 
> Here are my answers:
> 1- could you remind me how to know if self-heal is currently in progress? I 
> dont note any special neither mount-point (except /var/run/gluster/vol_home 
> one) nor dedicated process; but maybe i look in the wrong place..
> 2- OK, I just disabled write-behind parameter and rerun the bench. I’ll let 
> you know more about when I will arrive at my office (I’m still at home at 
> this time).
> 
> See you and thanks you for helping. 
> Geoffrey
> -----------------------------------------------
> Geoffrey Letessier
> 
> Responsable informatique & ingénieur système
> CNRS - UPR 9080 - Laboratoire de Biochimie Théorique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: [email protected]
> 
> Le 22 juin 2015 à 04:35, Krutika Dhananjay <[email protected]> a écrit :
> 
> Hi Geoffrey,
> 
> 1. Was self-heal also in progress while I/O was happening on the volume?
> 2. Also, there seem to be quite a few fsyncs which could possibly have slowed 
> things down a bit. Could you disable write-behind and try
>     getting the time stats one more time to eliminate the possibility of 
> write-behind's presence causing out-of-order writes to increase the number of 
> fsyncs
>     by the replication module.
> 
> -Krutika
> From: "Geoffrey Letessier" <[email protected]>
> To: [email protected]
> Sent: Saturday, June 20, 2015 6:04:40 AM
> Subject: Re: [Gluster-users] GlusterFS 3.5.3 - untar: very poor performance
> 
> Re,
> 
> For comparison, here is the output of the same script run on a distributed 
> only volume (2 servers of the 4 previously described, 2 bricks each):
> #######################################################
> ################  UNTAR time consumed  ################
> #######################################################
> 
> 
> real 1m44.698s
> user 0m8.891s
> sys 0m8.353s
> 
> #######################################################
> #################  DU time consumed  ##################
> #######################################################
> 
> 554M linux-4.1-rc6
> 
> real 0m21.062s
> user 0m0.100s
> sys 0m1.040s
> 
> #######################################################
> #################  FIND time consumed  ################
> #######################################################
> 
> 52663
> 
> real 0m21.325s
> user 0m0.104s
> sys 0m1.054s
> 
> #######################################################
> #################  GREP time consumed  ################
> #######################################################
> 
> 7952
> 
> real 0m43.618s
> user 0m0.922s
> sys 0m3.626s
> 
> #######################################################
> #################  TAR time consumed  #################
> #######################################################
> 
> 
> real 0m50.577s
> user 0m29.745s
> sys 0m4.086s
> 
> #######################################################
> #################  RM time consumed  ##################
> #######################################################
> 
> 
> real 0m41.133s
> user 0m0.171s
> sys 0m2.522s
> 
> The performances are amazing different!
> 
> Geoffrey
> -----------------------------------------------
> Geoffrey Letessier
> 
> Responsable informatique & ingénieur système
> CNRS - UPR 9080 - Laboratoire de Biochimie Théorique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: [email protected]
> 
> Le 20 juin 2015 à 02:12, Geoffrey Letessier <[email protected]> a 
> écrit :
> 
> Dear all,
> 
> I just noticed on my main volume of my HPC cluster my IO operations become 
> impressively poor.. 
> 
> Doing some file operations above a linux kernel sources compressed file, the 
> untar operation can take more than 1/2 hours for this file (roughly 80MB and 
> 52 000 files inside) as you read below:
> #######################################################
> ################  UNTAR time consumed  ################
> #######################################################
> 
> 
> real 32m42.967s
> user 0m11.783s
> sys 0m15.050s
> 
> #######################################################
> #################  DU time consumed  ##################
> #######################################################
> 
> 557M linux-4.1-rc6
> 
> real 0m25.060s
> user 0m0.068s
> sys 0m0.344s
> 
> #######################################################
> #################  FIND time consumed  ################
> #######################################################
> 
> 52663
> 
> real 0m25.687s
> user 0m0.084s
> sys 0m0.387s
> 
> #######################################################
> #################  GREP time consumed  ################
> #######################################################
> 
> 7952
> 
> real 2m15.890s
> user 0m0.887s
> sys 0m2.777s
> 
> #######################################################
> #################  TAR time consumed  #################
> #######################################################
> 
> 
> real 1m5.551s
> user 0m26.536s
> sys 0m2.609s
> 
> #######################################################
> #################  RM time consumed  ##################
> #######################################################
> 
> 
> real 2m51.485s
> user 0m0.167s
> sys 0m1.663s
> 
> For information, this volume is a distributed replicated one and is composed 
> by 4 servers with 2 bricks each. Each bricks is a 12-drives RAID6 vdisk with 
> nice native performances (around 1.2GBs).
> 
> In comparison, when I use DD to generate a 100GB file on the same volume, my 
> write throughput is around 1GB (client side) and 500MBs (server side) because 
> of replication:
> Client side:
> [root@node056 ~]# ifstat -i ib0
>        ib0        
>  KB/s in  KB/s out
>  3251.45  1.09e+06
>  3139.80  1.05e+06
>  3185.29  1.06e+06
>  3293.84  1.09e+06
> ...
> 
> Server side:
> [root@lucifer ~]# ifstat -i ib0
>        ib0        
>  KB/s in  KB/s out
> 561818.1   1746.42
> 560020.3   1737.92
> 526337.1   1648.20
> 513972.7   1613.69
> ...
> 
> DD command:
> [root@node056 ~]# dd if=/dev/zero of=/home/root/test.dd bs=1M count=100000
> 100000+0 enregistrements lus
> 100000+0 enregistrements écrits
> 104857600000 octets (105 GB) copiés, 202,99 s, 517 MB/s
> 
> So this issue doesn’t seem coming from the network (which is Infiniband 
> technology in this case)
> 
> You can find in attachments a set of files:
>  - mybench.sh: the bench script
>  - benches.txt: output of my "bench"
>  - profile.txt: gluster volume profile during the "bench"
>  - vol_status.txt: gluster volume status
>  - vol_info.txt: gluster volume info
> 
> Can someone help me to fix it (it’s very critical because this volume is on a 
> HPC cluster in production).
> 
> Thanks by advance,
> Geoffrey
> -----------------------------------------------
> Geoffrey Letessier
> 
> Responsable informatique & ingénieur système
> CNRS - UPR 9080 - Laboratoire de Biochimie Théorique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: [email protected]
> <benches.txt>
> 
> <mybench.sh>
> 
> <profile.txt>
> 
> <vol_info.txt>
> 
> <vol_status.txt>
> 
> 
> 
> _______________________________________________
> Gluster-users mailing list
> [email protected]
> http://www.gluster.org/mailman/listinfo/gluster-users
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>

_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] GlusterFS 3.5.3 - untar: very poor performance

Reply via email to