Re: [Gluster-users] Extremely slow cluster performance

Patrick Rennie Sun, 21 Apr 2019 07:29:58 -0700

I think just worked out why NFS lookups are sometimes slow and sometimes
fast as the hostname uses round robin DNS lookups, if I change to a
specific host, 01-B, it's always quick, and if I change to the other brick
host, 02-B, it's always slow.
Maybe that will help to narrow this down?


On Sun, Apr 21, 2019 at 10:24 PM Patrick Rennie <[email protected]>
wrote:

> Hi Strahil,
>
> Thank you for your reply and your suggestions. I'm not sure which logs
> would be most relevant to be checking to diagnose this issue, we have the
> brick logs, the cluster mount logs, the shd logs or something else? I have
> posted a few that I have seen repeated a few times already. I will continue
> to post anything further that I see.
> I am working on migrating data to some new storage, so this will slowly
> free up space, although this is a production cluster and new data is being
> uploaded every day, sometimes faster than I can migrate it off. I have
> several other similar clusters and none of them have the same problem, one
> the others is actually at 98-99% right now (big problem, I know) but still
> performs perfectly fine compared to this cluster, I am not sure low space
> is the root cause here.
>
> I currently have 13 VMs accessing this cluster, I have checked each one
> and all of them use one of the two options below to mount the cluster in
> fstab
>
> HOSTNAME:/gvAA01   /mountpoint    glusterfs
>  defaults,_netdev,rw,log-level=WARNING,direct-io-mode=disable,use-readdirp=no
>   0 0
> HOSTNAME:/gvAA01   /mountpoint    glusterfs
>  defaults,_netdev,rw,log-level=WARNING,direct-io-mode=disable
>
> I also have a few other VMs which use NFS to access the cluster, and these
> machines appear to be significantly quicker, initially I get a similar
> delay with NFS but if I cancel the first "ls" and try it again I get < 1
> sec lookups, this can take over 10 minutes by FUSE/gluster client, but the
> same trick of cancelling and trying again doesn't work for FUSE/gluster.
> Sometimes the NFS queries have no delay at all, so this is a bit strange to
> me.
> HOSTNAME:/gvAA01        /mountpoint/ nfs
> defaults,_netdev,vers=3,async,noatime 0 0
>
> Example:
> user@VM:~$ time ls /cluster/folder
> ^C
>
> real    9m49.383s
> user    0m0.001s
> sys     0m0.010s
>
> user@VM:~$ time ls /cluster/folder
> <results>
>
> real    0m0.069s
> user    0m0.001s
> sys     0m0.007s
>
> ---
>
> I have checked the profiling as you suggested, I let it run for around a
> minute, then cancelled it and saved the profile info.
>
> root@HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 start
> Starting volume profile on gvAA01 has been successful
> root@HOSTNAME:/var/log/glusterfs# time ls /cluster/folder
> ^C
>
> real    1m1.660s
> user    0m0.000s
> sys     0m0.002s
>
> root@HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 info >>
> ~/profile.txt
> root@HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 stop
>
> I will attach the results to this email as it's over 1000 lines.
> Unfortunately, I'm not sure what I'm looking at but possibly somebody will
> be able to help me make sense of it and let me know if it highlights any
> specific issues.
>
> Happy to try any further suggestions. Thank you,
>
> -Patrick
>
> On Sun, Apr 21, 2019 at 7:55 PM Strahil <[email protected]> wrote:
>
>> By the way, can you provide the 'volume info' and the mount options on
>> all clients?
>> Maybe , there is an option that uses a lot of resources due to some
>> client's mount options.
>>
>> Best Regards,
>> Strahil Nikolov
>> On Apr 21, 2019 10:55, Patrick Rennie <[email protected]> wrote:
>>
>> Just another small update, I'm continuing to watch my brick logs and I
>> just saw these errors come up in the recent events too. I am going to
>> continue to post any errors I see in the hope of finding the right one to
>> try and fix..
>> This is from the logs on brick1, seems to be occurring on both nodes on
>> brick1, although at different times. I'm not sure what this means, can
>> anyone shed any light?
>> I guess I am looking for some kind of specific error which may indicate
>> something is broken or stuck and locking up and causing the extreme latency
>> I'm seeing in the cluster.
>>
>> [2019-04-21 07:25:55.064497] E [rpcsvc.c:1364:rpcsvc_submit_generic]
>> 0-rpc-service: failed to submit message (XID: 0x7c700c, Program: GlusterFS
>> 3.3, ProgVers: 330, Proc: 29) to rpc-transport (tcp.gvAA01-server)
>> [2019-04-21 07:25:55.064612] E [server.c:195:server_submit_reply]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e58a)
>> [0x7f3b3e93158a]
>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17d45)
>> [0x7f3b3e4c5d45]
>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
>> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
>> [2019-04-21 07:25:55.064675] E [rpcsvc.c:1364:rpcsvc_submit_generic]
>> 0-rpc-service: failed to submit message (XID: 0x7c70af, Program: GlusterFS
>> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
>> [2019-04-21 07:25:55.064705] E [server.c:195:server_submit_reply]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
>> [0x7f3b3e9318fa]
>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
>> [0x7f3b3e4c5f35]
>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
>> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
>> [2019-04-21 07:25:55.064742] E [rpcsvc.c:1364:rpcsvc_submit_generic]
>> 0-rpc-service: failed to submit message (XID: 0x7c723c, Program: GlusterFS
>> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
>> [2019-04-21 07:25:55.064768] E [server.c:195:server_submit_reply]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
>> [0x7f3b3e9318fa]
>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
>> [0x7f3b3e4c5f35]
>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
>> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
>> [2019-04-21 07:25:55.064812] E [rpcsvc.c:1364:rpcsvc_submit_generic]
>> 0-rpc-service: failed to submit message (XID: 0x7c72b4, Program: GlusterFS
>> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
>> [2019-04-21 07:25:55.064837] E [server.c:195:server_submit_reply]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
>> [0x7f3b3e9318fa]
>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
>> [0x7f3b3e4c5f35]
>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
>> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
>> [2019-04-21 07:25:55.064880] E [rpcsvc.c:1364:rpcsvc_submit_generic]
>> 0-rpc-service: failed to submit message (XID: 0x7c740b, Program: GlusterFS
>> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
>> [2019-04-21 07:25:55.064905] E [server.c:195:server_submit_reply]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
>> [0x7f3b3e9318fa]
>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
>> [0x7f3b3e4c5f35]
>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
>> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
>> [2019-04-21 07:25:55.064939] E [rpcsvc.c:1364:rpcsvc_submit_generic]
>> 0-rpc-service: failed to submit message (XID: 0x7c7441, Program: GlusterFS
>> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
>> [2019-04-21 07:25:55.064962] E [server.c:195:server_submit_reply]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
>> [0x7f3b3e9318fa]
>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
>> [0x7f3b3e4c5f35]
>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
>> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
>> [2019-04-21 07:25:55.064996] E [rpcsvc.c:1364:rpcsvc_submit_generic]
>> 0-rpc-service: failed to submit message (XID: 0x7c74d5, Program: GlusterFS
>> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
>> [2019-04-21 07:25:55.065020] E [server.c:195:server_submit_reply]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
>> [0x7f3b3e9318fa]
>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
>> [0x7f3b3e4c5f35]
>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
>> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
>> [2019-04-21 07:25:55.065052] E [rpcsvc.c:1364:rpcsvc_submit_generic]
>> 0-rpc-service: failed to submit message (XID: 0x7c7551, Program: GlusterFS
>> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
>> [2019-04-21 07:25:55.065076] E [server.c:195:server_submit_reply]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
>> [0x7f3b3e9318fa]
>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
>> [0x7f3b3e4c5f35]
>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
>> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
>> [2019-04-21 07:25:55.065110] E [rpcsvc.c:1364:rpcsvc_submit_generic]
>> 0-rpc-service: failed to submit message (XID: 0x7c76d1, Program: GlusterFS
>> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
>> [2019-04-21 07:25:55.065133] E [server.c:195:server_submit_reply]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
>> [0x7f3b3e9318fa]
>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
>> [0x7f3b3e4c5f35]
>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
>> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
>>
>> Thanks again,
>>
>> -Patrick
>>
>> On Sun, Apr 21, 2019 at 3:50 PM Patrick Rennie <[email protected]>
>> wrote:
>>
>> Hi Darrell,
>>
>> Thanks again for your advice, I've left it for a while but unfortunately
>> it's still just as slow and causing more problems for our operations now. I
>> will need to try and take some steps to at least bring performance back to
>> normal while continuing to investigate the issue longer term. I can
>> definitely see one node with heavier CPU than the other, almost double,
>> which I am OK with, but I think the heal process is going to take forever,
>> trying to check the "gluster volume heal info" shows thousands and
>> thousands of files which may need healing, I have no idea how many in total
>> the command is still running after hours, so I am not sure what has gone so
>> wrong to cause this.
>>
>> I've checked cluster.op-version and cluster.max-op-version and it looks
>> like I'm on the latest version there.
>>
>> I have no idea how long the healing is going to take on this cluster, we
>> have around 560TB of data on here, but I don't think I can wait that long
>> to try and restore performance to normal.
>>
>> Can anyone think of anything else I can try in the meantime to work out
>> what's causing the extreme latency?
>>
>> I've been going through cluster client the logs of some of our VMs and on
>> some of our FTP servers I found this in the cluster mount log, but I am not
>> seeing it on any of our other servers, just our FTP servers.
>>
>> [2019-04-21 07:16:19.925388] E [MSGID: 101046]
>> [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null
>> [2019-04-21 07:19:43.413834] W [MSGID: 114031]
>> [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-19: remote
>> operation failed [No such file or directory]
>> [2019-04-21 07:19:43.414153] W [MSGID: 114031]
>> [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-20: remote
>> operation failed [No such file or directory]
>> [2019-04-21 07:23:33.154717] E [MSGID: 101046]
>> [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null
>> [2019-04-21 07:33:24.943913] E [MSGID: 101046]
>> [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null
>>
>> Any ideas what this could mean? I am basically just grasping at straws
>> here.
>>
>> I am going to hold off on the version upgrade until I know there are no
>> files which need healing, which could be a while, from some reading I've
>> done there shouldn't be any issues with this as both are on v3.12.x
>>
>> I've free'd up a small amount of space, but I still need to work on this
>> further.
>>
>> I've read of a command "find .glusterfs -type f -links -2 -exec rm {} \;"
>> which could be run on each brick and it would potentially clean up any
>> files which were deleted straight from the bricks, but not via the client,
>> I have a feeling this could help me free up about 5-10TB per brick from
>> what I've been told about the history of this cluster. Can anyone confirm
>> if this is actually safe to run?
>>
>> At this stage, I'm open to any suggestions as to how to proceed, thanks
>> again for any advice.
>>
>> Cheers,
>>
>> - Patrick
>>
>> On Sun, Apr 21, 2019 at 1:22 AM Darrell Budic <[email protected]>
>> wrote:
>>
>> Patrick,
>>
>> Sounds like progress. Be aware that gluster is expected to max out the
>> CPUs on at least one of your servers while healing. This is normal and
>> won’t adversely affect overall performance (any more than having bricks in
>> need of healing, at any rate) unless you’re overdoing it. shd threads <= 4
>> should not do that on your hardware. Other tunings may have also increased
>> overall performance, so you may see higher CPU than previously anyway. I’d
>> recommend upping those thread counts and letting it heal as fast as
>> possible, especially if these are dedicated Gluster storage servers (Ie:
>> not also running VMs, etc). You should see “normal” CPU use one heals are
>> completed. I see ~15-30% overall normally, 95-98% while healing (x my 20
>> cores). It’s also likely to be different between your servers, in a pure
>> replica, one tends to max and one tends to be a little higher, in a
>> distributed-replica, I’d expect more than one to run harder while healing.
>>
>> Keep the differences between doing an ls on a brick and doing an ls on a
>> gluster mount in mind. When you do a ls on a gluster volume, it isn’t just
>> doing a ls on one brick, it’s effectively doing it on ALL of your bricks,
>> and they all have to return data before the ls succeeds. In a distributed
>> volume, it’s figuring out where on each volume things live and getting the
>> stat() from each to assemble the whole thing. And if things are in need of
>> healing, it will take even longer to decide which version is current and
>> use it (shd triggers a heal anytime it encounters this). Any of these
>> things being slow slows down the overall response.
>>
>> At this point, I’d get some sleep too, and let your cluster heal while
>> you do. I’d really want it fully healed before I did any updates anyway, so
>> let it use CPU and get itself sorted out. Expect it to do a round of
>> healing after you upgrade each machine too, this is normal so don’t let the
>> CPU spike surprise you, It’s just catching up from the downtime incurred by
>> the update and/or reboot if you did one.
>>
>> That reminds me, check your gluster cluster.op-version and
>> cluster.max-op-version (gluster vol get all all | grep op-version). If
>> op-version isn’t at the max-op-verison, set it to it so you’re taking
>> advantage of the latest features available to your version.
>>
>>   -Darrell
>>
>> On Apr 20, 2019, at 11:54 AM, Patrick Rennie <[email protected]>
>> wrote:
>>
>> Hi Darrell,
>>
>> Thanks again for your advice, I've applied the acltype=posixacl on my
>> zpools and I think that has reduced some of the noise from my brick logs.
>> I also bumped up some of the thread counts you suggested but my CPU load
>> skyrocketed, so I dropped it back down to something slightly lower, but
>> still higher than it was before, and will see how that goes for a while.
>>
>> Although low space is a definite issue, if I run an ls anywhere on my
>> bricks directly it's instant, <1 second, and still takes several minutes
>> via gluster, so there is still a problem in my gluster configuration
>> somewhere. We don't have any snapshots, but I am trying to work out if any
>> data on there is safe to delete, or if there is any way I can safely find
>> and delete data which has been removed directly from the bricks in the
>> past. I also have lz4 compression already enabled on each zpool which does
>> help a bit, we get between 1.05 and 1.08x compression on this data.
>> I've tried to go through each client and checked it's cluster mount logs
>> and also my brick logs and looking for errors, so far nothing is jumping
>> out at me, but there are some warnings and errors here and there, I am
>> trying to work out what they mean.
>>
>> It's already 1 am here and unfortunately, I'm still awake working on this
>> issue, but I think that I will have to leave the version upgrades until
>> tomorrow.
>>
>> Thanks again for your advice so far. If anyone has any ideas on where I
>> can look for errors other than brick logs or the cluster mount logs to help
>> resolve this issue, it would be much appreciated.
>>
>> Cheers,
>>
>> - Patrick
>>
>> On Sat, Apr 20, 2019 at 11:57 PM Darrell Budic <[email protected]>
>> wrote:
>>
>> See inline:
>>
>> On Apr 20, 2019, at 10:09 AM, Patrick Rennie <[email protected]>
>> wrote:
>>
>> Hi Darrell,
>>
>> Thanks for your reply, this issue seems to be getting worse over the last
>> few days, really has me tearing my hair out. I will do as you have
>> suggested and get started on upgrading from 3.12.14 to 3.12.15.
>> I've checked the zfs properties and all bricks have "xattr=sa" set, but
>> none of them has "acltype=posixacl" set, currently the acltype property
>> shows "off", if I make these changes will it apply retroactively to the
>> existing data? I'm unfamiliar with what this will change so I may need to
>> look into that before I proceed.
>>
>>
>> It is safe to apply that now, any new set/get calls will then use it if
>> new posixacls exist, and use older if not. ZFS is good that way. It should
>> clear up your posix_acl and posix errors over time.
>>
>> I understand performance is going to slow down as the bricks get full, I
>> am currently trying to free space and migrate data to some newer storage, I
>> have fresh several hundred TB storage I just setup recently but with these
>> performance issues it's really slow. I also believe there is significant
>> data which has been deleted directly from the bricks in the past, so if I
>> can reclaim this space in a safe manner then I will have at least around
>> 10-15% free space.
>>
>>
>> Full ZFS volumes will have a much larger impact on performance than you’d
>> think, I’d prioritize this. If you have been taking zfs snapshots, consider
>> deleting them to get the overall volume free space back up. And just to be
>> sure it’s been said, delete from within the mounted volumes, don’t delete
>> directly from the bricks (gluster will just try and heal it later,
>> compounding your issues). Does not apply to deleting other data from the
>> ZFS volume if it’s not part of the brick directory, of course.
>>
>> These servers have dual 8 core Xeon (E5-2620v4) and 512GB of RAM so
>> generally they have plenty of resources available, currently only using
>> around 330/512GB of memory.
>>
>> I will look into what your suggested settings will change, and then will
>> probably go ahead with your recommendations, for our specs as stated above,
>> what would you suggest for performance.io-thread-count ?
>>
>>
>> I run single 2630v4s on my servers, which have a smaller storage
>> footprint than yours. I’d go with 32 for performance.io-thread-count.
>> I’d try 4 for the shd thread settings on that gear. Your memory use sounds
>> fine, so no worries there.
>>
>> Our workload is nothing too extreme, we have a few VMs which write backup
>> data to this storage nightly for our clients, our VMs don't live on this
>> cluster, but just write to it.
>>
>>
>> If they are writing compressible data, you’ll get immediate benefit by
>> setting compression=lz4 on your ZFS volumes. It won’t help any old data, of
>> course, but it will compress new data going forward. This is another one
>> that’s safe to enable on the fly.
>>
>> I've been going through all of the logs I can, below are some slightly
>> sanitized errors I've come across, but I'm not sure what to make of them.
>> The main error I am seeing is the first one below, across several of my
>> bricks, but possibly only for specific folders on the cluster, I'm not 100%
>> about that yet though.
>>
>> [2019-04-20 05:56:59.512649] E [MSGID: 113001]
>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on
>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default  [Operation not
>> supported]
>> [2019-04-20 05:59:06.084333] E [MSGID: 113001]
>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on
>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default  [Operation not
>> supported]
>> [2019-04-20 05:59:43.289030] E [MSGID: 113001]
>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on
>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default  [Operation not
>> supported]
>> [2019-04-20 05:59:50.582257] E [MSGID: 113001]
>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on
>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default  [Operation not
>> supported]
>> [2019-04-20 06:01:42.501701] E [MSGID: 113001]
>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on
>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default  [Operation not
>> supported]
>> [2019-04-20 06:01:51.665354] W [posix.c:4929:posix_getxattr]
>> 0-gvAA01-posix: Extended attributes not supported (try remounting brick
>> with 'user_xattr' flag)
>>
>>
>> [2019-04-20 13:12:36.131856] E [MSGID: 113002]
>> [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for
>> /xxxxxxxxxxxxxxxxxxxx [Invalid argument]
>> [2019-04-20 13:12:36.131959] E [MSGID: 113002] [posix.c:362:posix_lookup]
>> 0-gvAA01-posix: buf->ia_gfid is null for
>> /brick2/xxxxxxxxxxxxxxxxxxxx_62906_tmp [No data available]
>> [2019-04-20 13:12:36.132016] E [MSGID: 115050]
>> [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24274759: LOOKUP
>> /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud
>> Backup_clone1.vbm_62906_tmp), client:
>> 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator:
>> gvAA01-posix [No data available]
>> [2019-04-20 13:12:38.093719] E [MSGID: 115050]
>> [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24276491: LOOKUP
>> /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud
>> Backup_clone1.vbm_62906_tmp), client:
>> 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator:
>> gvAA01-posix [No data available]
>> [2019-04-20 13:12:38.093660] E [MSGID: 113002]
>> [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for
>> /xxxxxxxxxxxxxxxxxxxx [Invalid argument]
>> [2019-04-20 13:12:38.093696] E [MSGID: 113002] [posix.c:362:posix_lookup]
>> 0-gvAA01-posix: buf->ia_gfid is null for /brick2/xxxxxxxxxxxxxxxxxxxx [No
>> data available]
>>
>>
>> posixacls should clear those up, as mentioned.
>>
>>
>> [2019-04-20 14:25:59.654576] E [inodelk.c:404:__inode_unlock_lock]
>> 0-gvAA01-locks:  Matching lock not found for unlock 0-9223372036854775807,
>> by 980fdbbd367f0000 on 0x7fc4f0161440
>> [2019-04-20 14:25:59.654668] E [MSGID: 115053]
>> [server-rpc-fops.c:295:server_inodelk_cbk] 0-gvAA01-server: 6092928:
>> INODELK /xxxxxxxxxxxxxxxxxxxx.cdr$ (25b14631-a179-4274-8243-6e272d4f2ad8),
>> client:
>> cb-per-worker18-53637-2019/04/19-14:25:37:927673-gvAA01-client-1-0-4,
>> error-xlator: gvAA01-locks [Invalid argument]
>>
>>
>> [2019-04-20 13:35:07.495495] E [rpcsvc.c:1364:rpcsvc_submit_generic]
>> 0-rpc-service: failed to submit message (XID: 0x247c644, Program: GlusterFS
>> 3.3, ProgVers: 330, Proc: 27) to rpc-transport (tcp.gvAA01-server)
>> [2019-04-20 13:35:07.495619] E [server.c:195:server_submit_reply]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/debug/io-stats.so(+0x1696a)
>> [0x7ff4ae6f796a]
>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x2d6e8)
>> [0x7ff4ae2a96e8]
>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x928d)
>> [0x7ff4ae28528d] ) 0-: Reply submission failed
>>
>>
>> Fix the posix acls and see if these clear up over time as well, I’m
>> unclear on what the overall effect of running without the posix acls will
>> be to total gluster health. Your biggest problem sounds like you need to
>> free up space on the volumes and get the overall volume health back up to
>> par and see if that doesn’t resolve the symptoms you’re seeing.
>>
>>
>>
>> Thank you again for your assistance. It is greatly appreciated.
>>
>> - Patrick
>>
>>
>>
>> On Sat, Apr 20, 2019 at 10:50 PM Darrell Budic <[email protected]>
>> wrote:
>>
>> Patrick,
>>
>> I would definitely upgrade your two nodes from 3.12.14 to 3.12.15. You
>> also mention ZFS, and that error you show makes me think you need to check
>> to be sure you have “xattr=sa” and “acltype=posixacl” set on your ZFS
>> volumes.
>>
>> You also observed your bricks are crossing the 95% full line, ZFS
>> performance will degrade significantly the closer you get to full. In my
>> experience, this starts somewhere between 10% and 5% free space remaining,
>> so you’re in that realm.
>>
>> How’s your free memory on the servers doing? Do you have your zfs arc
>> cache limited to something less than all the RAM? It shares pretty well,
>> but I’ve encountered situations where other things won’t try and take ram
>> back properly if they think it’s in use, so ZFS never gets the opportunity
>> to give it up.
>>
>> Since your volume is a disperse-replica, you might try tuning
>> disperse.shd-max-threads, default is 1, I’d try it at 2, 4, or even more if
>> the CPUs are beefy enough. And setting server.event-threads to 4 and
>> client.event-threads to 8 has proven helpful in many cases. After you get
>> upgraded to 3.12.15, enabling performance.stat-prefetch may help as well. I
>> don’t know if it matters, but I’d also recommend resetting
>> performance.least-prio-threads to the default of 1 (or try 2 or 4) and/or
>> also setting performance.io-thread-count to 32 if those have beefy CPUs.
>>
>> Beyond those general ideas, more info about your hardware (CPU and RAM)
>> and workload (VMs, direct storage for web servers or enders, etc) may net
>> you some more ideas. Then you’re going to have to do more digging into
>> brick logs looking for errors and/or warnings to see what’s going on.
>>
>>   -Darrell
>>
>>
>> On Apr 20, 2019, at 8:22 AM, Patrick Rennie <[email protected]>
>> wrote:
>>
>> Hello Gluster Users,
>>
>> I am hoping someone can help me with resolving an ongoing issue I've been
>> having, I'm new to mailing lists so forgive me if I have gotten anything
>> wrong. We have noticed our performance deteriorating over the last few
>> weeks, easily measured by trying to do an ls on one of our top-level
>> folders, and timing it, which usually would take 2-5 seconds, and now takes
>> up to 20 minutes, which obviously renders our cluster basically unusable.
>> This has been intermittent in the past but is now almost constant and I am
>> not sure how to work out the exact cause. We have noticed some errors in
>> the brick logs, and have noticed that if we kill the right brick process,
>> performance instantly returns back to normal, this is not always the same
>> brick, but it indicates to me something in the brick processes or
>> background tasks may be causing extreme latency. Due to this ability to fix
>> it by killing the right brick process off, I think it's a specific file, or
>> folder, or operation which may be hanging and causing the increased
>> latency, but I am not sure how to work it out. One last thing to add is
>> that our bricks are getting quite full (~95% full), we are trying to
>> migrate data off to new storage but that is going slowly, not helped by
>> this issue. I am currently trying to run a full heal as there appear to be
>> many files needing healing, and I have all brick processes running so they
>> have an opportunity to heal, but this means performance is very poor. It
>> currently takes over 15-20 minutes to do an ls of one of our top-level
>> folders, which just contains 60-80 other folders, this should take 2-5
>> seconds. This is all being checked by FUSE mount locally on the storage
>> node itself, but it is the same for other clients and VMs accessing the
>> cluster. Initially, it seemed our NFS mounts were not affected and operated
>> at normal speed, but testing over the last day has shown that our NFS
>> clients are also extremely slow, so it doesn't seem specific to FUSE as I
>> first thought it might be.
>>
>> I am not sure how to proceed from here, I am fairly new to gluster having
>> inherited this setup from my predecessor and trying to keep it going. I
>> have included some info below to try and help with diagnosis, please let me
>> know if any further info would be helpful. I would really appreciate any
>> advice on what I could try to work out the cause. Thank you in advance for
>> reading this, and any suggestions you might be able to offer.
>>
>> - Patrick
>>
>> This is an example of the main error I see in our brick logs, there have
>> been others, I can post them when I see them again too:
>> [2019-04-20 04:54:43.055680] E [MSGID: 113001]
>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on
>> /brick1/<filename> library: system.posix_acl_default  [Operation not
>> supported]
>> [2019-04-20 05:01:29.476313] W [posix.c:4929:posix_getxattr]
>> 0-gvAA01-posix: Extended attributes not supported (try remounting brick
>> with 'user_xattr' flag)
>>
>> Our setup consists of 2 storage nodes and an arbiter node. I have noticed
>> our nodes are on slightly different versions, I'm not sure if this could be
>> an issue. We have 9 bricks on each node, made up of ZFS RAIDZ2 pools -
>> total capacity is around 560TB.
>> We have bonded 10gbps NICS on each node, and I have tested bandwidth with
>> iperf and found that it's what would be expected from this config.
>> Individual brick performance seems ok, I've tested several bricks using
>> dd and can write a 10GB files at 1.7GB/s.
>>
>> # dd if=/dev/zero of=/brick1/test/test.file bs=1M count=10000
>> 10000+0 records in
>> 10000+0 records out
>> 10485760000 bytes (10 GB, 9.8 GiB) copied, 6.20303 s, 1.7 GB/s
>>
>> Node 1:
>> # glusterfs --version
>> glusterfs 3.12.15
>>
>> Node 2:
>> # glusterfs --version
>> glusterfs 3.12.14
>>
>> Arbiter:
>> # glusterfs --version
>> glusterfs 3.12.14
>>
>> Here is our gluster volume status:
>>
>> # gluster volume status
>> Status of volume: gvAA01
>> Gluster process                             TCP Port  RDMA Port  Online
>> Pid
>>
>> ------------------------------------------------------------------------------
>> Brick 01-B:/brick1/gvAA01/brick    49152     0          Y       7219
>> Brick 02-B:/brick1/gvAA01/brick    49152     0          Y       21845
>> Brick 00-A:/arbiterAA01/gvAA01/bri
>> ck1                                         49152     0          Y
>>  6931
>> Brick 01-B:/brick2/gvAA01/brick    49153     0          Y       7239
>> Brick 02-B:/brick2/gvAA01/brick    49153     0          Y       9916
>> Brick 00-A:/arbiterAA01/gvAA01/bri
>> ck2                                         49153     0          Y
>>  6939
>> Brick 01-B:/brick3/gvAA01/brick    49154     0          Y       7235
>> Brick 02-B:/brick3/gvAA01/brick    49154     0          Y       21858
>> Brick 00-A:/arbiterAA01/gvAA01/bri
>> ck3                                         49154     0          Y
>>  6947
>> Brick 01-B:/brick4/gvAA01/brick    49155     0          Y       31840
>> Brick 02-B:/brick4/gvAA01/brick    49155     0          Y       9933
>> Brick 00-A:/arbiterAA01/gvAA01/bri
>> ck4                                         49155     0          Y
>>  6956
>> Brick 01-B:/brick5/gvAA01/brick    49156     0          Y       7233
>> Brick 02-B:/brick5/gvAA01/brick    49156     0          Y       9942
>> Brick 00-A:/arbiterAA01/gvAA01/bri
>> ck5                                         49156     0          Y
>>  6964
>> Brick 01-B:/brick6/gvAA01/brick    49157     0          Y       7234
>> Brick 02-B:/brick6/gvAA01/brick    49157     0          Y       9952
>> Brick 00-A:/arbiterAA01/gvAA01/bri
>> ck6                                         49157     0          Y
>>  6974
>> Brick 01-B:/brick7/gvAA01/brick    49158     0          Y       7248
>> Brick 02-B:/brick7/gvAA01/brick    49158     0          Y       9960
>> Brick 00-A:/arbiterAA01/gvAA01/bri
>> ck7                                         49158     0          Y
>>  6984
>> Brick 01-B:/brick8/gvAA01/brick    49159     0          Y       7253
>> Brick 02-B:/brick8/gvAA01/brick    49159     0          Y       9970
>> Brick 00-A:/arbiterAA01/gvAA01/bri
>> ck8                                         49159     0          Y
>>  6993
>> Brick 01-B:/brick9/gvAA01/brick    49160     0          Y       7245
>> Brick 02-B:/brick9/gvAA01/brick    49160     0          Y       9984
>> Brick 00-A:/arbiterAA01/gvAA01/bri
>> ck9                                         49160     0          Y
>>  7001
>> NFS Server on localhost                     2049      0          Y
>>  17276
>> Self-heal Daemon on localhost               N/A       N/A        Y
>>  25245
>> NFS Server on 02-B                 2049      0          Y       9089
>> Self-heal Daemon on 02-B           N/A       N/A        Y       17838
>> NFS Server on 00-a                 2049      0          Y       15660
>> Self-heal Daemon on 00-a           N/A       N/A        Y       16218
>>
>> Task Status of Volume gvAA01
>>
>> ------------------------------------------------------------------------------
>> There are no active volume tasks
>>
>> And gluster volume info:
>>
>> # gluster volume info
>>
>> Volume Name: gvAA01
>> Type: Distributed-Replicate
>> Volume ID: ca4ece2c-13fe-414b-856c-2878196d6118
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 9 x (2 + 1) = 27
>> Transport-type: tcp
>> Bricks:
>> Brick1: 01-B:/brick1/gvAA01/brick
>> Brick2: 02-B:/brick1/gvAA01/brick
>> Brick3: 00-A:/arbiterAA01/gvAA01/brick1 (arbiter)
>> Brick4: 01-B:/brick2/gvAA01/brick
>> Brick5: 02-B:/brick2/gvAA01/brick
>> Brick6: 00-A:/arbiterAA01/gvAA01/brick2 (arbiter)
>> Brick7: 01-B:/brick3/gvAA01/brick
>> Brick8: 02-B:/brick3/gvAA01/brick
>> Brick9: 00-A:/arbiterAA01/gvAA01/brick3 (arbiter)
>> Brick10: 01-B:/brick4/gvAA01/brick
>> Brick11: 02-B:/brick4/gvAA01/brick
>> Brick12: 00-A:/arbiterAA01/gvAA01/brick4 (arbiter)
>> Brick13: 01-B:/brick5/gvAA01/brick
>> Brick14: 02-B:/brick5/gvAA01/brick
>> Brick15: 00-A:/arbiterAA01/gvAA01/brick5 (arbiter)
>> Brick16: 01-B:/brick6/gvAA01/brick
>> Brick17: 02-B:/brick6/gvAA01/brick
>> Brick18: 00-A:/arbiterAA01/gvAA01/brick6 (arbiter)
>> Brick19: 01-B:/brick7/gvAA01/brick
>> Brick20: 02-B:/brick7/gvAA01/brick
>> Brick21: 00-A:/arbiterAA01/gvAA01/brick7 (arbiter)
>> Brick22: 01-B:/brick8/gvAA01/brick
>> Brick23: 02-B:/brick8/gvAA01/brick
>> Brick24: 00-A:/arbiterAA01/gvAA01/brick8 (arbiter)
>> Brick25: 01-B:/brick9/gvAA01/brick
>> Brick26: 02-B:/brick9/gvAA01/brick
>> Brick27: 00-A:/arbiterAA01/gvAA01/brick9 (arbiter)
>> Options Reconfigured:
>> cluster.shd-max-threads: 4
>> performance.least-prio-threads: 16
>> cluster.readdir-optimize: on
>> performance.quick-read: off
>> performance.stat-prefetch: off
>> cluster.data-self-heal: on
>> cluster.lookup-unhashed: auto
>> cluster.lookup-optimize: on
>> cluster.favorite-child-policy: mtime
>> server.allow-insecure: on
>> transport.address-family: inet
>> client.bind-insecure: on
>> cluster.entry-self-heal: off
>> cluster.metadata-self-heal: off
>> performance.md-cache-timeout: 600
>> cluster.self-heal-daemon: enable
>> performance.readdir-ahead: on
>> diagnostics.brick-log-level: INFO
>> nfs.disable: off
>>
>> Thank you for any assistance.
>>
>> - Patrick
>> _______________________________________________
>> Gluster-users mailing list
>> [email protected]
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>>
>>

_______________________________________________
Gluster-users mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Extremely slow cluster performance

Reply via email to