Re: [Gluster-users] GlusterFS 3.5.3 - untar: very poor performance

Geoffrey Letessier Mon, 22 Jun 2015 01:01:51 -0700

Hi Krutika,

Sorry for the delay but I was overwhelmed since my arrival at my office :/


here are the result of the "bench" with write-behind disabled:
#######################################################
################  UNTAR time consumed  ################
#######################################################


real    31m34.948s
user    0m11.245s
sys     0m21.880s

#######################################################
#################  DU time consumed  ##################
#######################################################

557M    linux-4.1-rc6

real    0m48.851s
user    0m0.117s
sys     0m1.097s

#######################################################
#################  FIND time consumed  ################
#######################################################

52663

real    0m45.922s
user    0m0.278s
sys     0m1.547s

#######################################################
#################  GREP time consumed  ################
#######################################################

7952

real    4m30.424s
user    0m0.933s
sys     0m4.884s

#######################################################
#################  TAR time consumed  #################
#######################################################


real    5m19.299s
user    0m30.281s
sys     0m5.786s

#######################################################
#################  RM time consumed  ##################
#######################################################


real    9m4.438s
user    0m0.451s
sys     0m5.523s

The result is very very bad, worse than previous ones.
Due to this result, i re-set the write-behing parameter to on.

Here is the only one process i found talking about self-heal (here set to off 
?????!)
# ps aux|grep heal
root      9437  0.0  0.0 296132 45620 ?        Ssl  May12   6:29 
/usr/sbin/glusterfs -s localhost --volfile-id gluster/quotad -p 
/var/lib/glusterd/quotad/run/quotad.pid -l /var/log/glusterfs/quotad.log -S 
/var/run/71b4e5df9ffee9a41bcf0d94b98dc558.socket --xlator-option 
*replicate*.data-self-heal=off --xlator-option 
*replicate*.metadata-self-heal=off --xlator-option 
*replicate*.entry-self-heal=off
root     42532  0.0  0.0 105324   940 pts/7    S+   09:51   0:00 grep heal


Thanks by advance for your help and fix.
Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: [email protected]

> Le 22 juin 2015 à 04:35, Krutika Dhananjay <[email protected]> a écrit :
> 
> Hi Geoffrey,
> 
> 1. Was self-heal also in progress while I/O was happening on the volume?
> 2. Also, there seem to be quite a few fsyncs which could possibly have slowed 
> things down a bit. Could you disable write-behind and try
>     getting the time stats one more time to eliminate the possibility of 
> write-behind's presence causing out-of-order writes to increase the number of 
> fsyncs
>     by the replication module.
> 
> -Krutika
> From: "Geoffrey Letessier" <[email protected]>
> To: [email protected]
> Sent: Saturday, June 20, 2015 6:04:40 AM
> Subject: Re: [Gluster-users] GlusterFS 3.5.3 - untar: very poor performance
> 
> Re,
> 
> For comparison, here is the output of the same script run on a distributed 
> only volume (2 servers of the 4 previously described, 2 bricks each):
> #######################################################
> ################  UNTAR time consumed  ################
> #######################################################
> 
> 
> real  1m44.698s
> user  0m8.891s
> sys   0m8.353s
> 
> #######################################################
> #################  DU time consumed  ##################
> #######################################################
> 
> 554M  linux-4.1-rc6
> 
> real  0m21.062s
> user  0m0.100s
> sys   0m1.040s
> 
> #######################################################
> #################  FIND time consumed  ################
> #######################################################
> 
> 52663
> 
> real  0m21.325s
> user  0m0.104s
> sys   0m1.054s
> 
> #######################################################
> #################  GREP time consumed  ################
> #######################################################
> 
> 7952
> 
> real  0m43.618s
> user  0m0.922s
> sys   0m3.626s
> 
> #######################################################
> #################  TAR time consumed  #################
> #######################################################
> 
> 
> real  0m50.577s
> user  0m29.745s
> sys   0m4.086s
> 
> #######################################################
> #################  RM time consumed  ##################
> #######################################################
> 
> 
> real  0m41.133s
> user  0m0.171s
> sys   0m2.522s
> 
> The performances are amazing different!
> 
> Geoffrey
> -----------------------------------------------
> Geoffrey Letessier
> 
> Responsable informatique & ingénieur système
> CNRS - UPR 9080 - Laboratoire de Biochimie Théorique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: [email protected] 
> <mailto:[email protected]>
> 
> Le 20 juin 2015 à 02:12, Geoffrey Letessier <[email protected] 
> <mailto:[email protected]>> a écrit :
> 
> Dear all,
> 
> I just noticed on my main volume of my HPC cluster my IO operations become 
> impressively poor.. 
> 
> Doing some file operations above a linux kernel sources compressed file, the 
> untar operation can take more than 1/2 hours for this file (roughly 80MB and 
> 52 000 files inside) as you read below:
> #######################################################
> ################  UNTAR time consumed  ################
> #######################################################
> 
> 
> real  32m42.967s
> user  0m11.783s
> sys   0m15.050s
> 
> #######################################################
> #################  DU time consumed  ##################
> #######################################################
> 
> 557M  linux-4.1-rc6
> 
> real  0m25.060s
> user  0m0.068s
> sys   0m0.344s
> 
> #######################################################
> #################  FIND time consumed  ################
> #######################################################
> 
> 52663
> 
> real  0m25.687s
> user  0m0.084s
> sys   0m0.387s
> 
> #######################################################
> #################  GREP time consumed  ################
> #######################################################
> 
> 7952
> 
> real  2m15.890s
> user  0m0.887s
> sys   0m2.777s
> 
> #######################################################
> #################  TAR time consumed  #################
> #######################################################
> 
> 
> real  1m5.551s
> user  0m26.536s
> sys   0m2.609s
> 
> #######################################################
> #################  RM time consumed  ##################
> #######################################################
> 
> 
> real  2m51.485s
> user  0m0.167s
> sys   0m1.663s
> 
> For information, this volume is a distributed replicated one and is composed 
> by 4 servers with 2 bricks each. Each bricks is a 12-drives RAID6 vdisk with 
> nice native performances (around 1.2GBs).
> 
> In comparison, when I use DD to generate a 100GB file on the same volume, my 
> write throughput is around 1GB (client side) and 500MBs (server side) because 
> of replication:
> Client side:
> [root@node056 ~]# ifstat -i ib0
>        ib0        
>  KB/s in  KB/s out
>  3251.45  1.09e+06
>  3139.80  1.05e+06
>  3185.29  1.06e+06
>  3293.84  1.09e+06
> ...
> 
> Server side:
> [root@lucifer ~]# ifstat -i ib0
>        ib0        
>  KB/s in  KB/s out
> 561818.1   1746.42
> 560020.3   1737.92
> 526337.1   1648.20
> 513972.7   1613.69
> ...
> 
> DD command:
> [root@node056 ~]# dd if=/dev/zero of=/home/root/test.dd bs=1M count=100000
> 100000+0 enregistrements lus
> 100000+0 enregistrements écrits
> 104857600000 octets (105 GB) copiés, 202,99 s, 517 MB/s
> 
> So this issue doesn’t seem coming from the network (which is Infiniband 
> technology in this case)
> 
> You can find in attachments a set of files:
>       - mybench.sh: the bench script
>       - benches.txt: output of my "bench"
>       - profile.txt: gluster volume profile during the "bench"
>       - vol_status.txt: gluster volume status
>       - vol_info.txt: gluster volume info
> 
> Can someone help me to fix it (it’s very critical because this volume is on a 
> HPC cluster in production).
> 
> Thanks by advance,
> Geoffrey
> -----------------------------------------------
> Geoffrey Letessier
> 
> Responsable informatique & ingénieur système
> CNRS - UPR 9080 - Laboratoire de Biochimie Théorique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: [email protected] 
> <mailto:[email protected]>
> <benches.txt>
> <mybench.sh>
> <profile.txt>
> <vol_info.txt>
> <vol_status.txt>
> 
> 
> _______________________________________________
> Gluster-users mailing list
> [email protected]
> http://www.gluster.org/mailman/listinfo/gluster-users
>

_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] GlusterFS 3.5.3 - untar: very poor performance

Reply via email to