Re: [Gluster-users] Cascading errors and very bad write performance

Vijaikumar M Fri, 07 Aug 2015 05:57:41 -0700


On Friday 07 August 2015 05:34 PM, Geoffrey Letessier wrote:

Hi Vijay,
My brick logs issue and big performance problem have begun when Iupgraded Gluster into 3.7.3 version; before write throughput was goodenough (~500MBs) -but not as good as with GlusterFS 3.5.3 (especiallywith distributed volumes)- and didn’t notice these problème withbrick-logs.
OK… in live:
i just disable to quota for my home volume and now my performanceappears to be relatively better (around 300MBs) but i still see thelogs (from storage1 and its replicate storage2) growing up with onlythis kind of lines:[2015-08-07 11:16:51.746142] E [dict.c:1418:dict_copy_with_ref](-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)[0x7f85e9a6a410]-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)[0x7f85e9a6a188]-->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)[0x3e99c20674] ) 0-dict: invalid argument: dict [Argument invalide]

We have root caused log issue,  bug# 1244613 tracks this issue

After a few minutes: my write throughput seems to be now correct(~550MBs) but the log are still growing up (to not say exploding). Soone part of the problem looks like taking its origin in the quotasystem management.… after a few minutes (and still only 1 client connected), now it isthe read operation which is very very slow… -I’m gonna become crazy! :/-
# ddt -t 50g /home/
Writing to /home/ddt.11293 ... syncing ... done.
sleeping 10 seconds ... done.
Reading from /home/ddt.11293 ... done.
35840MiB    KiB/s  CPU%
Write      568201     5
Read       567008     4
# ddt -t 50g /home/
Writing to /home/ddt.11397 ... syncing ... done.
sleeping 10 seconds ... done.
Reading from /home/ddt.11397 ... done.
51200MiB    KiB/s  CPU%
Write      573631     5
Read       164716     1

and my log are still exploding…

After having re-enabled the quota on my volume:
# ddt -t 50g /home/
Writing to /home/ddt.11817 ... syncing ... done.
sleeping 10 seconds ... done.
Reading from /home/ddt.11817 ... done.
51200MiB    KiB/s  CPU%
Write      269608     3
Read       160219     1

Thanks
Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: [email protected]<mailto:[email protected]>
Le 7 août 2015 à 06:28, Vijaikumar M <[email protected]<mailto:[email protected]>> a écrit :
Hi Geoffrey,

Some performance improvements has been done in quota in glusterfs-3.7.3.
Could you upgrade to glusterfs-3.7.3 and see if this helps

Thanks,
Vijay


On Friday 07 August 2015 05:02 AM, Geoffrey Letessier wrote:
Hi,
No idea to help me fix this issue? (big logs, small writeperformance (/4), etc.)
For comparison, here to volumes:
- home: distributed on 4 bricks / 2 nodes (and replicated on 4other bricks / 2 other nodes):
# ddt -t 35g /home
Writing to /home/ddt.24172 ... syncing ... done.
sleeping 10 seconds ... done.
Reading from /home/ddt.24172 ... done.
33792MiB    KiB/s  CPU%
Write      103659     1
Read       391955     3
- workdir: distributed on 4 bricks / 2 nodes (one the same RAIDvolumes and servers than home):
# ddt -t 35g /workdir
Writing to /workdir/ddt.24717 ... syncing ... done.
sleeping 10 seconds ... done.
Reading from /workdir/ddt.24717 ... done.
35840MiB    KiB/s  CPU%
Write      738314     4
Read       536497     4
For information, previously on 3.5.3-2 version, I obtained roughly1.1GBs for workdir volume and ~550-600MBs for home.
All my tests (CP, RSYNC, etc.) provides me the same result (writethroughput between 100MBs and 150MBs)
Thanks.
Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: [email protected]<mailto:[email protected]>
Le 5 août 2015 à 10:40, Geoffrey Letessier<[email protected] <mailto:[email protected]>> aécrit :
Hello,
In addition, knowing I have reactivated the log (brick-log-level =INFO not CRITICAL) only for the file creation duration (i.e. a fewminutes), do you have noticed the log sizes and the number of linesinside:
# ls -lh storage*
-rw------- 1 letessier staff 18M 5 aoû 00:54storage1__export-brick_home-brick1-data.log-rw------- 1 letessier staff 2,1K 5 aoû 00:54storage1__export-brick_home-brick2-data.log-rw------- 1 letessier staff 15M 5 aoû 00:56storage2__export-brick_home-brick1-data.log-rw------- 1 letessier staff 2,1K 5 aoû 00:54storage2__export-brick_home-brick2-data.log-rw------- 1 letessier staff 47M 5 aoû 00:55storage3__export-brick_home-brick1-data.log-rw------- 1 letessier staff 2,1K 5 aoû 00:54storage3__export-brick_home-brick2-data.log-rw------- 1 letessier staff 47M 5 aoû 00:55storage4__export-brick_home-brick1-data.log-rw------- 1 letessier staff 2,1K 5 aoû 00:55storage4__export-brick_home-brick2-data.log
# wc -l storage*
   55381 storage1__export-brick_home-brick1-data.log
    17 storage1__export-brick_home-brick2-data.log
   41636 storage2__export-brick_home-brick1-data.log
    17 storage2__export-brick_home-brick2-data.log
  270360 storage3__export-brick_home-brick1-data.log
    17 storage3__export-brick_home-brick2-data.log
  270358 storage4__export-brick_home-brick1-data.log
    17 storage4__export-brick_home-brick2-data.log
  637803 total
If the let brick-log-level to INFO, the brick log files in eachserver will consume all my /var partition capacity within only afew hours/days…
Thanks in advance,
Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: [email protected]<mailto:[email protected]>
Le 5 août 2015 à 01:12, Geoffrey Letessier<[email protected] <mailto:[email protected]>> aécrit :
Hello,
Since the problem motioned previously (all errors noticed in bricklog files), i notice a very very bad performance: i can note mywrite performance divided by 4 than previously -knowing it was notso good before.Now, a write of a 33GB file, my write throughput is around 150MBs(with Infiniband), before it was around 550-600MBs; and this, bothwith RDMA and TCP protocol.
During this test, more than 40 000 error lines (as the following)were added to the brick log files.[2015-08-04 22:34:27.337622] E [dict.c:1418:dict_copy_with_ref](-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)[0x7f021c6f7410]-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)[0x7f021c6f7188]-->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)[0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide]
All brick log files are in attachments.

Thanks in advance for all your help and fix,
Best,
Geoffrey
PS: question: is it possible to easily downgrade GlusterFS to aprevious version from 3.7 (for example: v3.5)?
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: [email protected]<mailto:[email protected]><bricks-logs.tgz>

_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Cascading errors and very bad write performance

Reply via email to