Re: [Gluster-users] Cascading errors and very bad write performance

Geoffrey Letessier Mon, 10 Aug 2015 08:50:05 -0700

> 1. ls -l on all the brick mounts.


# pdsh -w storage[1-4] ls -l /export/brick_home/brick{1,2}
storage4: /export/brick_home/brick1:
storage4: total 0
storage4: drwxr-xr-x 11 root root 172  7 août  01:40 data
storage4: 
storage4: /export/brick_home/brick2:
storage4: total 0
storage4: drwxr-xr-x 11 root root 172 10 août  10:39 data
storage3: /export/brick_home/brick1:
storage3: total 0
storage3: drwxr-xr-x 11 root root 172  7 août  01:40 data
storage3: 
storage3: /export/brick_home/brick2:
storage3: total 0
storage3: drwxr-xr-x 11 root root 172 10 août  10:39 data
storage1: /export/brick_home/brick1:
storage1: total 0
storage1: drwxr-xr-x 11 root root 172  7 août  13:53 data
storage1: 
storage1: /export/brick_home/brick2:
storage1: total 0
storage1: drwxr-xr-x 11 root root 172  7 août  13:09 data
storage2: /export/brick_home/brick1:
storage2: total 0
storage2: drwxr-xr-x 11 root root 172  7 août  13:53 data
storage2: 
storage2: /export/brick_home/brick2:
storage2: total 0
storage2: drwxr-xr-x 11 root root 172  7 août  13:09 data

> 2. Did you change transport of the volume with the volume on?

Effectively, i’ve re-set the network transport type of my volumes with the 
volume on with this command (as advised by Soumya Koduri)
# gluster volume set vol_home config.transport tcp,rdma

Best.
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: [email protected]

Le 10 août 2015 à 16:04, Raghavendra Talur <[email protected]> a écrit :

> Will look into the issue from RDMA side and let you know if we find something.
> 
> A few more details from your setup are required.
> 
> 1. ls -l on all the brick mounts.
> 2. Did you change transport of the volume with the volume on?
> 
> Raghavendra Talur
> 
> On Sun, Aug 9, 2015 at 2:23 PM, Geoffrey Letessier 
> <[email protected]> wrote:
> Hi Mathieu,
> 
> First of all, thanks for replying.
> 
> I’ve done your proposal but there’s no change: my brick logs are still 
> growing up in the server where the file is written, with this kind of lines:
> [...]
> [2015-08-09 08:16:57.856987] W [marker-quota.c:3379:_mq_initiate_quota_txn] 
> 0-vol_home-marker: parent is NULL for 
> <gfid:64c302ab-2171-4656-8e5f-47e474de80b6>, aborting updation txn
> [2015-08-09 08:16:57.857691] W [marker-quota.c:3379:_mq_initiate_quota_txn] 
> 0-vol_home-marker: parent is NULL for 
> <gfid:64c302ab-2171-4656-8e5f-47e474de80b6>, aborting updation txn
> [2015-08-09 08:16:57.858403] W [marker-quota.c:3379:_mq_initiate_quota_txn] 
> 0-vol_home-marker: parent is NULL for 
> <gfid:64c302ab-2171-4656-8e5f-47e474de80b6>, aborting updation txn
> [2015-08-09 08:16:57.859226] W [marker-quota.c:3379:_mq_initiate_quota_txn] 
> 0-vol_home-marker: parent is NULL for 
> <gfid:64c302ab-2171-4656-8e5f-47e474de80b6>, aborting updation txn
> [2015-08-09 08:16:57.859982] W [marker-quota.c:3379:_mq_initiate_quota_txn] 
> 0-vol_home-marker: parent is NULL for 
> <gfid:64c302ab-2171-4656-8e5f-47e474de80b6>, aborting updating txn  
> The message "W [MSGID: 113001] 
> [posix.c:3700:posix_get_ancestry_non_directory] 0-vol_home-posix: listxattr 
> failed 
> on/export/brick_home/brick2/data/.glusterfs/64/c3/64c302ab-2171-4656-8e5f-47e474de80b6
>  [Aucun fichier ou dossier de ce type]" repeated 149711 times between 
> [2015-08-09 08:15:17.811919] and [2015-08-09 08:16:57.859754]
> [2015-08-09 08:16:59.629692] W [MSGID: 113001] 
> [posix.c:3700:posix_get_ancestry_non_directory] 0-vol_home-posix: listxattr 
> failed 
> on/export/brick_home/brick2/data/.glusterfs/64/c3/64c302ab-2171-4656-8e5f-47e474de80b6
>  [Aucun fichier ou dossier de ce type]
> [...]
> 
> and here the ddt output:
> # ddt -t 35g /home/
> Writing to /home/ddt.12247 ... syncing ... done.
> sleeping 10 seconds ... done.
> Reading from /home/ddt.12247 ... done.
> 35840MiB    KiB/s  CPU%
> Write      184737     2
> Read       484209     3
> 
> For just a write of only one 35GB file (with a blank log files before) :
> # grep "parent is NULL" 
> /var/log/glusterfs/bricks/export-brick_home-brick2-data.log|wc -l
> 286720
> 
> # grep "xattr" /var/log/glusterfs/bricks/export-brick_home-brick2-data.log|wc 
> -l
> 5
> 
> # wc -l /var/log/glusterfs/bricks/export-brick_home-brick2-data.log
> 286733 /var/log/glusterfs/bricks/export-brick_home-brick2-data.log
> 
> and the other kind of lines in the brick log file:
> # grep -vE "(xattr|parent is NULL)" 
> /var/log/glusterfs/bricks/export-brick_home-brick2-data.log
> [2015-08-09 08:13:16.368705] I [MSGID: 115034] 
> [server.c:397:_check_for_auth_option] 0-/export/brick_home/brick2/data: skip 
> format check for non-addr auth option 
> auth.login./export/brick_home/brick2/data.allow
> [2015-08-09 08:13:16.368858] I [MSGID: 115034] 
> [server.c:397:_check_for_auth_option] 0-/export/brick_home/brick2/data: skip 
> format check for non-addr auth option 
> auth.login.dffafb7e-3ff2-4e91-b30b-eb87c6cfe621.password
> [2015-08-09 08:13:16.368953] E [MSGID: 115041] [server.c:833:reconfigure] 
> 0-vol_home-server: Reconfigure not found for transport
> [2015-08-09 08:13:16.377119] I [glusterfsd-mgmt.c:1512:mgmt_getspec_cbk] 
> 0-glusterfs: No change in volfile, continuing
> [2015-08-09 08:13:16.393164] I [glusterfsd-mgmt.c:1512:mgmt_getspec_cbk] 
> 0-glusterfs: No change in volfile, continuing
> [2015-08-09 08:13:16.402136] I [glusterfsd-mgmt.c:1512:mgmt_getspec_cbk] 
> 0-glusterfs: No change in volfile, continuing
> [2015-08-09 08:13:16.410998] I [glusterfsd-mgmt.c:1512:mgmt_getspec_cbk] 
> 0-glusterfs: No change in volfile, continuing
> [2015-08-09 08:22:16.000685] E [MSGID: 113104] 
> [posix-handle.c:154:posix_make_ancestryfromgfid] 0-vol_home-posix: could not 
> read the link from the gfid handle 
> /export/brick_home/brick2/data/.glusterfs/b3/7a/b37a7750-f250-4ab4-8b29-bba519b6dc69
>   [Aucun fichier ou dossier de ce type]
> [2015-08-09 08:32:17.000668] E [MSGID: 113104] 
> [posix-handle.c:154:posix_make_ancestryfromgfid] 0-vol_home-posix: could not 
> read the link from the gfid handle 
> /export/brick_home/brick2/data/.glusterfs/b3/7a/b37a7750-f250-4ab4-8b29-bba519b6dc69
>   [Aucun fichier ou dossier de ce type]
> 
> No change in logs if i run the command with a simple user but a slightly 
> better performance for write but a slightly lower performance for read:
> $ ddt -t 35g /home/admin_team/letessier/
> Writing to /home/admin_team/letessier/ddt.12489 ... syncing ... done.
> sleeping 10 seconds ... done.
> Reading from /home/admin_team/letessier/ddt.12489 ... done.
> 35840MiB    KiB/s  CPU%
> Write      280981     3
> Read       313502     2
> 
> Any other idea?
> 
> Frankly, I'm very frustrated for having stopped our scientific computing 
> production more than six weeks ago and, due to cascading issues in GlusterFS, 
> to not be able to restart it for the moment and…
> 
> Thanks again,
> Geoffrey
> ------------------------------------------------------
> Geoffrey Letessier
> Responsable informatique & ingénieur système
> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: [email protected]
> 
> Le 8 août 2015 à 10:02, Mathieu Chateau <[email protected]> a écrit :
> 
>> Maybe related to the insecure port issue reported ?
>> 
>> try with :
>> 
>> gluster volume set xxx server.allow-insecure on
>> 
>> Cordialement,
>> Mathieu CHATEAU
>> http://www.lotp.fr
>> 
>> 2015-08-07 23:47 GMT+02:00 Geoffrey Letessier <[email protected]>:
>> I’m not really sure to well understand your answer.
>> 
>> I try to set inode-lru-limit to 1, I can not notice any good effect. 
>> 
>> When i re-run ddt application, I can note 2 kinds of messages:
>> [2015-08-07 21:29:21.792156] W [marker-quota.c:3379:_mq_initiate_quota_txn] 
>> 0-vol_home-marker: parent is NULL for 
>> <gfid:5a32328a-7fd9-474e-9bc6-cafde9c41af7>, aborting updation txn
>> [2015-08-07 21:29:21.792176] W [marker-quota.c:3379:_mq_initiate_quota_txn] 
>> 0-vol_home-marker: parent is NULL for 
>> <gfid:5a32328a-7fd9-474e-9bc6-cafde9c41af7>, aborting updation txn
>> 
>> and/or:
>> [2015-08-07 21:44:19.279971] E [marker-quota.c:2990:mq_start_quota_txn_v2] 
>> 0-vol_home-marker: contribution node list is empty 
>> (31d7bf88-b63a-4731-a737-a3dce73b8cd1)
>> [2015-08-07 21:41:26.177095] E [dict.c:1418:dict_copy_with_ref] 
>> (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)
>>  [0x7f85e9a6a410] 
>> -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88) 
>> [0x7f85e9a6a188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4) 
>> [0x3e99c20674] ) 0-dict: invalid argument: dict [Argument invalide]
>> 
>> And concerning the bad IO performance?
>> 
>> [letessier@node031 ~]$ ddt -t 35g /home/admin_team/letessier/
>> Writing to /home/admin_team/letessier/ddt.25259 ... syncing ... done.
>> sleeping 10 seconds ... done.
>> Reading from /home/admin_team/letessier/ddt.25259 ... done.
>> 35840MiB    KiB/s  CPU%
>> Write      277451     3
>> Read       188682     1
>> [letessier@node031 ~]$ logout
>> [root@node031 ~]# ddt -t 35g /home/
>> Writing to /home/ddt.25559 ... syncing ... done.
>> sleeping 10 seconds ... done.
>> Reading from /home/ddt.25559 ... done.
>> 35840MiB    KiB/s  CPU%
>> Write      196539     2
>> Read       438944     3
>> Notice the read/write throughput differences when i’m root and when i’m a 
>> simple user.
>> 
>> Thanks.
>> Geoffrey
>> ------------------------------------------------------
>> Geoffrey Letessier
>> Responsable informatique & ingénieur système
>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
>> Institut de Biologie Physico-Chimique
>> 13, rue Pierre et Marie Curie - 75005 Paris
>> Tel: 01 58 41 50 93 - eMail: [email protected]
>> 
>> Le 7 août 2015 à 14:57, Vijaikumar M <[email protected]> a écrit :
>> 
>>> 
>>> 
>>> On Friday 07 August 2015 05:34 PM, Geoffrey Letessier wrote:
>>>> Hi Vijay, 
>>>> 
>>>> My brick logs issue and big performance problem have begun when I upgraded 
>>>> Gluster into 3.7.3 version; before write throughput was good enough 
>>>> (~500MBs) -but not as good as with GlusterFS 3.5.3 (especially with 
>>>> distributed volumes)- and didn’t notice these problème with brick-logs.
>>>> 
>>>> OK… in live:
>>>> 
>>>> i just disable to quota for my home volume and now my performance appears 
>>>> to be relatively better (around 300MBs) but i still see the logs (from 
>>>> storage1 and its replicate storage2) growing up with only this kind of 
>>>> lines:
>>>> [2015-08-07 11:16:51.746142] E [dict.c:1418:dict_copy_with_ref] 
>>>> (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)
>>>>  [0x7f85e9a6a410] 
>>>> -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88) 
>>>> [0x7f85e9a6a188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4) 
>>>> [0x3e99c20674] ) 0-dict: invalid argument: dict [Argument invalide]
>>>> 
>>> We have root caused log issue,  bug# 1244613 tracks this issue
>>> 
>>> 
>>>> After a few minutes: my write throughput seems to be now correct (~550MBs) 
>>>> but the log are still growing up (to not say exploding). So one part of 
>>>> the problem looks like taking its origin in the quota system management.
>>>> … after a few minutes (and still only 1 client connected), now it is the 
>>>> read operation which is very very slow… -I’m gonna become crazy! :/-
>>>> # ddt -t 50g /home/
>>>> Writing to /home/ddt.11293 ... syncing ... done.
>>>> sleeping 10 seconds ... done.
>>>> Reading from /home/ddt.11293 ... done.
>>>> 35840MiB    KiB/s  CPU%
>>>> Write      568201     5
>>>> Read       567008     4
>>>> # ddt -t 50g /home/
>>>> Writing to /home/ddt.11397 ... syncing ... done.
>>>> sleeping 10 seconds ... done.
>>>> Reading from /home/ddt.11397 ... done.
>>>> 51200MiB    KiB/s  CPU%
>>>> Write      573631     5
>>>> Read       164716     1
>>>> 
>>>> and my log are still exploding…
>>>> 
>>>> After having re-enabled the quota on my volume: 
>>>> # ddt -t 50g /home/
>>>> Writing to /home/ddt.11817 ... syncing ... done.
>>>> sleeping 10 seconds ... done.
>>>> Reading from /home/ddt.11817 ... done.
>>>> 51200MiB    KiB/s  CPU%
>>>> Write      269608     3
>>>> Read       160219     1
>>>> 
>>>> Thanks 
>>>> Geoffrey 
>>>> ------------------------------------------------------
>>>> Geoffrey Letessier
>>>> Responsable informatique & ingénieur système
>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
>>>> Institut de Biologie Physico-Chimique
>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>> Tel: 01 58 41 50 93 - eMail: [email protected]
>>>> 
>>>> Le 7 août 2015 à 06:28, Vijaikumar M <[email protected]> a écrit :
>>>> 
>>>>> Hi Geoffrey,
>>>>> 
>>>>> Some performance improvements has been done in quota in glusterfs-3.7.3.
>>>>> Could you upgrade to glusterfs-3.7.3 and see if this helps
>>>>> 
>>>>> Thanks,
>>>>> Vijay
>>>>> 
>>>>> 
>>>>> On Friday 07 August 2015 05:02 AM, Geoffrey Letessier wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> No idea to help me fix this issue? (big logs, small write performance 
>>>>>> (/4), etc.)
>>>>>> 
>>>>>> For comparison, here to volumes: 
>>>>>> 
>>>>>>                 - home: distributed on 4 bricks / 2 nodes  (and 
>>>>>> replicated on 4 other bricks / 2 other nodes):
>>>>>> # ddt -t 35g /home
>>>>>> Writing to /home/ddt.24172 ... syncing ... done.
>>>>>> sleeping 10 seconds ... done.
>>>>>> Reading from /home/ddt.24172 ... done.
>>>>>> 33792MiB    KiB/s  CPU%
>>>>>> Write      103659     1
>>>>>> Read       391955     3
>>>>>> 
>>>>>> 
>>>>>>                 - workdir: distributed on 4 bricks / 2 nodes (one the 
>>>>>> same RAID volumes and servers than home):
>>>>>> # ddt -t 35g /workdir
>>>>>> Writing to /workdir/ddt.24717 ... syncing ... done.
>>>>>> sleeping 10 seconds ... done.
>>>>>> Reading from /workdir/ddt.24717 ... done.
>>>>>> 35840MiB    KiB/s  CPU%
>>>>>> Write      738314     4
>>>>>> Read       536497     4
>>>>>> 
>>>>>> For information, previously on 3.5.3-2 version, I obtained roughly 
>>>>>> 1.1GBs for workdir volume and ~550-600MBs for home.
>>>>>> 
>>>>>> All my tests (CP, RSYNC, etc.) provides me the same result (write 
>>>>>> throughput between 100MBs and 150MBs)
>>>>>> 
>>>>>> Thanks.
>>>>>> Geoffrey
>>>>>> ------------------------------------------------------
>>>>>> Geoffrey Letessier
>>>>>> Responsable informatique & ingénieur système
>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
>>>>>> Institut de Biologie Physico-Chimique
>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>>>> Tel: 01 58 41 50 93 - eMail: [email protected]
>>>>>> 
>>>>>> Le 5 août 2015 à 10:40, Geoffrey Letessier <[email protected]> 
>>>>>> a écrit :
>>>>>> 
>>>>>>> Hello,
>>>>>>> 
>>>>>>> In addition, knowing I have reactivated the log (brick-log-level = INFO 
>>>>>>> not CRITICAL) only for the file creation duration (i.e. a few minutes), 
>>>>>>> do you have noticed the log sizes and the number of lines inside:
>>>>>>> # ls -lh storage*
>>>>>>> -rw-------  1 letessier  staff    18M  5 aoû 00:54 
>>>>>>> storage1__export-brick_home-brick1-data.log
>>>>>>> -rw-------  1 letessier  staff   2,1K  5 aoû 00:54 
>>>>>>> storage1__export-brick_home-brick2-data.log
>>>>>>> -rw-------  1 letessier  staff    15M  5 aoû 00:56 
>>>>>>> storage2__export-brick_home-brick1-data.log
>>>>>>> -rw-------  1 letessier  staff   2,1K  5 aoû 00:54 
>>>>>>> storage2__export-brick_home-brick2-data.log
>>>>>>> -rw-------  1 letessier  staff    47M  5 aoû 00:55 
>>>>>>> storage3__export-brick_home-brick1-data.log
>>>>>>> -rw-------  1 letessier  staff   2,1K  5 aoû 00:54 
>>>>>>> storage3__export-brick_home-brick2-data.log
>>>>>>> -rw-------  1 letessier  staff    47M  5 aoû 00:55 
>>>>>>> storage4__export-brick_home-brick1-data.log
>>>>>>> -rw-------  1 letessier  staff   2,1K  5 aoû 00:55 
>>>>>>> storage4__export-brick_home-brick2-data.log
>>>>>>> 
>>>>>>> # wc -l storage*
>>>>>>>    55381 storage1__export-brick_home-brick1-data.log
>>>>>>>       17 storage1__export-brick_home-brick2-data.log
>>>>>>>    41636 storage2__export-brick_home-brick1-data.log
>>>>>>>       17 storage2__export-brick_home-brick2-data.log
>>>>>>>   270360 storage3__export-brick_home-brick1-data.log
>>>>>>>       17 storage3__export-brick_home-brick2-data.log
>>>>>>>   270358 storage4__export-brick_home-brick1-data.log
>>>>>>>       17 storage4__export-brick_home-brick2-data.log
>>>>>>>   637803 total
>>>>>>> 
>>>>>>> If the let brick-log-level to INFO, the brick log files in each server 
>>>>>>> will consume all my /var partition capacity within only a few 
>>>>>>> hours/days…
>>>>>>> 
>>>>>>> Thanks in advance,
>>>>>>> Geoffrey
>>>>>>> ------------------------------------------------------
>>>>>>> Geoffrey Letessier
>>>>>>> Responsable informatique & ingénieur système
>>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
>>>>>>> Institut de Biologie Physico-Chimique
>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>>>>> Tel: 01 58 41 50 93 - eMail: [email protected]
>>>>>>> 
>>>>>>> Le 5 août 2015 à 01:12, Geoffrey Letessier <[email protected]> 
>>>>>>> a écrit :
>>>>>>> 
>>>>>>>> Hello,
>>>>>>>> 
>>>>>>>> Since the problem motioned previously (all errors noticed in brick log 
>>>>>>>> files), i notice a very very bad performance: i can note my write 
>>>>>>>> performance divided by 4 than previously -knowing it was not so good 
>>>>>>>> before.
>>>>>>>> Now, a write of a 33GB file, my write throughput is around 150MBs 
>>>>>>>> (with Infiniband), before it was around 550-600MBs; and this, both 
>>>>>>>> with RDMA and TCP protocol.
>>>>>>>> 
>>>>>>>> During this test, more than 40 000 error lines (as the following) were 
>>>>>>>> added to the brick log files.
>>>>>>>> [2015-08-04 22:34:27.337622] E [dict.c:1418:dict_copy_with_ref] 
>>>>>>>> (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)
>>>>>>>>  [0x7f021c6f7410] 
>>>>>>>> -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)
>>>>>>>>  [0x7f021c6f7188] 
>>>>>>>> -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4) 
>>>>>>>> [0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide]
>>>>>>>> 
>>>>>>>> 
>>>>>>>> All brick log files are in attachments.
>>>>>>>> 
>>>>>>>> Thanks in advance for all your help and fix,
>>>>>>>> Best,
>>>>>>>> Geoffrey
>>>>>>>> 
>>>>>>>> PS: question: is it possible to easily downgrade GlusterFS to a 
>>>>>>>> previous version from 3.7 (for example: v3.5)?
>>>>>>>> 
>>>>>>>> ------------------------------------------------------
>>>>>>>> Geoffrey Letessier
>>>>>>>> Responsable informatique & ingénieur système
>>>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
>>>>>>>> Institut de Biologie Physico-Chimique
>>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>>>>>> Tel: 01 58 41 50 93 - eMail: [email protected]
>>>>>>>> <bricks-logs.tgz>
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>> 
>> _______________________________________________
>> Gluster-users mailing list
>> [email protected]
>> http://www.gluster.org/mailman/listinfo/gluster-users
>> 
> 
> 
> _______________________________________________
> Gluster-users mailing list
> [email protected]
> http://www.gluster.org/mailman/listinfo/gluster-users
>

_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Cascading errors and very bad write performance

Reply via email to