I think just worked out why NFS lookups are sometimes slow and sometimes fast as the hostname uses round robin DNS lookups, if I change to a specific host, 01-B, it's always quick, and if I change to the other brick host, 02-B, it's always slow. Maybe that will help to narrow this down?
On Sun, Apr 21, 2019 at 10:24 PM Patrick Rennie <[email protected]> wrote: > Hi Strahil, > > Thank you for your reply and your suggestions. I'm not sure which logs > would be most relevant to be checking to diagnose this issue, we have the > brick logs, the cluster mount logs, the shd logs or something else? I have > posted a few that I have seen repeated a few times already. I will continue > to post anything further that I see. > I am working on migrating data to some new storage, so this will slowly > free up space, although this is a production cluster and new data is being > uploaded every day, sometimes faster than I can migrate it off. I have > several other similar clusters and none of them have the same problem, one > the others is actually at 98-99% right now (big problem, I know) but still > performs perfectly fine compared to this cluster, I am not sure low space > is the root cause here. > > I currently have 13 VMs accessing this cluster, I have checked each one > and all of them use one of the two options below to mount the cluster in > fstab > > HOSTNAME:/gvAA01 /mountpoint glusterfs > defaults,_netdev,rw,log-level=WARNING,direct-io-mode=disable,use-readdirp=no > 0 0 > HOSTNAME:/gvAA01 /mountpoint glusterfs > defaults,_netdev,rw,log-level=WARNING,direct-io-mode=disable > > I also have a few other VMs which use NFS to access the cluster, and these > machines appear to be significantly quicker, initially I get a similar > delay with NFS but if I cancel the first "ls" and try it again I get < 1 > sec lookups, this can take over 10 minutes by FUSE/gluster client, but the > same trick of cancelling and trying again doesn't work for FUSE/gluster. > Sometimes the NFS queries have no delay at all, so this is a bit strange to > me. > HOSTNAME:/gvAA01 /mountpoint/ nfs > defaults,_netdev,vers=3,async,noatime 0 0 > > Example: > user@VM:~$ time ls /cluster/folder > ^C > > real 9m49.383s > user 0m0.001s > sys 0m0.010s > > user@VM:~$ time ls /cluster/folder > <results> > > real 0m0.069s > user 0m0.001s > sys 0m0.007s > > --- > > I have checked the profiling as you suggested, I let it run for around a > minute, then cancelled it and saved the profile info. > > root@HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 start > Starting volume profile on gvAA01 has been successful > root@HOSTNAME:/var/log/glusterfs# time ls /cluster/folder > ^C > > real 1m1.660s > user 0m0.000s > sys 0m0.002s > > root@HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 info >> > ~/profile.txt > root@HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 stop > > I will attach the results to this email as it's over 1000 lines. > Unfortunately, I'm not sure what I'm looking at but possibly somebody will > be able to help me make sense of it and let me know if it highlights any > specific issues. > > Happy to try any further suggestions. Thank you, > > -Patrick > > On Sun, Apr 21, 2019 at 7:55 PM Strahil <[email protected]> wrote: > >> By the way, can you provide the 'volume info' and the mount options on >> all clients? >> Maybe , there is an option that uses a lot of resources due to some >> client's mount options. >> >> Best Regards, >> Strahil Nikolov >> On Apr 21, 2019 10:55, Patrick Rennie <[email protected]> wrote: >> >> Just another small update, I'm continuing to watch my brick logs and I >> just saw these errors come up in the recent events too. I am going to >> continue to post any errors I see in the hope of finding the right one to >> try and fix.. >> This is from the logs on brick1, seems to be occurring on both nodes on >> brick1, although at different times. I'm not sure what this means, can >> anyone shed any light? >> I guess I am looking for some kind of specific error which may indicate >> something is broken or stuck and locking up and causing the extreme latency >> I'm seeing in the cluster. >> >> [2019-04-21 07:25:55.064497] E [rpcsvc.c:1364:rpcsvc_submit_generic] >> 0-rpc-service: failed to submit message (XID: 0x7c700c, Program: GlusterFS >> 3.3, ProgVers: 330, Proc: 29) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.064612] E [server.c:195:server_submit_reply] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e58a) >> [0x7f3b3e93158a] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17d45) >> [0x7f3b3e4c5d45] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> [2019-04-21 07:25:55.064675] E [rpcsvc.c:1364:rpcsvc_submit_generic] >> 0-rpc-service: failed to submit message (XID: 0x7c70af, Program: GlusterFS >> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.064705] E [server.c:195:server_submit_reply] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) >> [0x7f3b3e9318fa] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) >> [0x7f3b3e4c5f35] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> [2019-04-21 07:25:55.064742] E [rpcsvc.c:1364:rpcsvc_submit_generic] >> 0-rpc-service: failed to submit message (XID: 0x7c723c, Program: GlusterFS >> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.064768] E [server.c:195:server_submit_reply] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) >> [0x7f3b3e9318fa] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) >> [0x7f3b3e4c5f35] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> [2019-04-21 07:25:55.064812] E [rpcsvc.c:1364:rpcsvc_submit_generic] >> 0-rpc-service: failed to submit message (XID: 0x7c72b4, Program: GlusterFS >> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.064837] E [server.c:195:server_submit_reply] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) >> [0x7f3b3e9318fa] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) >> [0x7f3b3e4c5f35] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> [2019-04-21 07:25:55.064880] E [rpcsvc.c:1364:rpcsvc_submit_generic] >> 0-rpc-service: failed to submit message (XID: 0x7c740b, Program: GlusterFS >> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.064905] E [server.c:195:server_submit_reply] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) >> [0x7f3b3e9318fa] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) >> [0x7f3b3e4c5f35] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> [2019-04-21 07:25:55.064939] E [rpcsvc.c:1364:rpcsvc_submit_generic] >> 0-rpc-service: failed to submit message (XID: 0x7c7441, Program: GlusterFS >> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.064962] E [server.c:195:server_submit_reply] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) >> [0x7f3b3e9318fa] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) >> [0x7f3b3e4c5f35] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> [2019-04-21 07:25:55.064996] E [rpcsvc.c:1364:rpcsvc_submit_generic] >> 0-rpc-service: failed to submit message (XID: 0x7c74d5, Program: GlusterFS >> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.065020] E [server.c:195:server_submit_reply] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) >> [0x7f3b3e9318fa] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) >> [0x7f3b3e4c5f35] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> [2019-04-21 07:25:55.065052] E [rpcsvc.c:1364:rpcsvc_submit_generic] >> 0-rpc-service: failed to submit message (XID: 0x7c7551, Program: GlusterFS >> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.065076] E [server.c:195:server_submit_reply] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) >> [0x7f3b3e9318fa] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) >> [0x7f3b3e4c5f35] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> [2019-04-21 07:25:55.065110] E [rpcsvc.c:1364:rpcsvc_submit_generic] >> 0-rpc-service: failed to submit message (XID: 0x7c76d1, Program: GlusterFS >> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.065133] E [server.c:195:server_submit_reply] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) >> [0x7f3b3e9318fa] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) >> [0x7f3b3e4c5f35] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> >> Thanks again, >> >> -Patrick >> >> On Sun, Apr 21, 2019 at 3:50 PM Patrick Rennie <[email protected]> >> wrote: >> >> Hi Darrell, >> >> Thanks again for your advice, I've left it for a while but unfortunately >> it's still just as slow and causing more problems for our operations now. I >> will need to try and take some steps to at least bring performance back to >> normal while continuing to investigate the issue longer term. I can >> definitely see one node with heavier CPU than the other, almost double, >> which I am OK with, but I think the heal process is going to take forever, >> trying to check the "gluster volume heal info" shows thousands and >> thousands of files which may need healing, I have no idea how many in total >> the command is still running after hours, so I am not sure what has gone so >> wrong to cause this. >> >> I've checked cluster.op-version and cluster.max-op-version and it looks >> like I'm on the latest version there. >> >> I have no idea how long the healing is going to take on this cluster, we >> have around 560TB of data on here, but I don't think I can wait that long >> to try and restore performance to normal. >> >> Can anyone think of anything else I can try in the meantime to work out >> what's causing the extreme latency? >> >> I've been going through cluster client the logs of some of our VMs and on >> some of our FTP servers I found this in the cluster mount log, but I am not >> seeing it on any of our other servers, just our FTP servers. >> >> [2019-04-21 07:16:19.925388] E [MSGID: 101046] >> [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null >> [2019-04-21 07:19:43.413834] W [MSGID: 114031] >> [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-19: remote >> operation failed [No such file or directory] >> [2019-04-21 07:19:43.414153] W [MSGID: 114031] >> [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-20: remote >> operation failed [No such file or directory] >> [2019-04-21 07:23:33.154717] E [MSGID: 101046] >> [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null >> [2019-04-21 07:33:24.943913] E [MSGID: 101046] >> [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null >> >> Any ideas what this could mean? I am basically just grasping at straws >> here. >> >> I am going to hold off on the version upgrade until I know there are no >> files which need healing, which could be a while, from some reading I've >> done there shouldn't be any issues with this as both are on v3.12.x >> >> I've free'd up a small amount of space, but I still need to work on this >> further. >> >> I've read of a command "find .glusterfs -type f -links -2 -exec rm {} \;" >> which could be run on each brick and it would potentially clean up any >> files which were deleted straight from the bricks, but not via the client, >> I have a feeling this could help me free up about 5-10TB per brick from >> what I've been told about the history of this cluster. Can anyone confirm >> if this is actually safe to run? >> >> At this stage, I'm open to any suggestions as to how to proceed, thanks >> again for any advice. >> >> Cheers, >> >> - Patrick >> >> On Sun, Apr 21, 2019 at 1:22 AM Darrell Budic <[email protected]> >> wrote: >> >> Patrick, >> >> Sounds like progress. Be aware that gluster is expected to max out the >> CPUs on at least one of your servers while healing. This is normal and >> won’t adversely affect overall performance (any more than having bricks in >> need of healing, at any rate) unless you’re overdoing it. shd threads <= 4 >> should not do that on your hardware. Other tunings may have also increased >> overall performance, so you may see higher CPU than previously anyway. I’d >> recommend upping those thread counts and letting it heal as fast as >> possible, especially if these are dedicated Gluster storage servers (Ie: >> not also running VMs, etc). You should see “normal” CPU use one heals are >> completed. I see ~15-30% overall normally, 95-98% while healing (x my 20 >> cores). It’s also likely to be different between your servers, in a pure >> replica, one tends to max and one tends to be a little higher, in a >> distributed-replica, I’d expect more than one to run harder while healing. >> >> Keep the differences between doing an ls on a brick and doing an ls on a >> gluster mount in mind. When you do a ls on a gluster volume, it isn’t just >> doing a ls on one brick, it’s effectively doing it on ALL of your bricks, >> and they all have to return data before the ls succeeds. In a distributed >> volume, it’s figuring out where on each volume things live and getting the >> stat() from each to assemble the whole thing. And if things are in need of >> healing, it will take even longer to decide which version is current and >> use it (shd triggers a heal anytime it encounters this). Any of these >> things being slow slows down the overall response. >> >> At this point, I’d get some sleep too, and let your cluster heal while >> you do. I’d really want it fully healed before I did any updates anyway, so >> let it use CPU and get itself sorted out. Expect it to do a round of >> healing after you upgrade each machine too, this is normal so don’t let the >> CPU spike surprise you, It’s just catching up from the downtime incurred by >> the update and/or reboot if you did one. >> >> That reminds me, check your gluster cluster.op-version and >> cluster.max-op-version (gluster vol get all all | grep op-version). If >> op-version isn’t at the max-op-verison, set it to it so you’re taking >> advantage of the latest features available to your version. >> >> -Darrell >> >> On Apr 20, 2019, at 11:54 AM, Patrick Rennie <[email protected]> >> wrote: >> >> Hi Darrell, >> >> Thanks again for your advice, I've applied the acltype=posixacl on my >> zpools and I think that has reduced some of the noise from my brick logs. >> I also bumped up some of the thread counts you suggested but my CPU load >> skyrocketed, so I dropped it back down to something slightly lower, but >> still higher than it was before, and will see how that goes for a while. >> >> Although low space is a definite issue, if I run an ls anywhere on my >> bricks directly it's instant, <1 second, and still takes several minutes >> via gluster, so there is still a problem in my gluster configuration >> somewhere. We don't have any snapshots, but I am trying to work out if any >> data on there is safe to delete, or if there is any way I can safely find >> and delete data which has been removed directly from the bricks in the >> past. I also have lz4 compression already enabled on each zpool which does >> help a bit, we get between 1.05 and 1.08x compression on this data. >> I've tried to go through each client and checked it's cluster mount logs >> and also my brick logs and looking for errors, so far nothing is jumping >> out at me, but there are some warnings and errors here and there, I am >> trying to work out what they mean. >> >> It's already 1 am here and unfortunately, I'm still awake working on this >> issue, but I think that I will have to leave the version upgrades until >> tomorrow. >> >> Thanks again for your advice so far. If anyone has any ideas on where I >> can look for errors other than brick logs or the cluster mount logs to help >> resolve this issue, it would be much appreciated. >> >> Cheers, >> >> - Patrick >> >> On Sat, Apr 20, 2019 at 11:57 PM Darrell Budic <[email protected]> >> wrote: >> >> See inline: >> >> On Apr 20, 2019, at 10:09 AM, Patrick Rennie <[email protected]> >> wrote: >> >> Hi Darrell, >> >> Thanks for your reply, this issue seems to be getting worse over the last >> few days, really has me tearing my hair out. I will do as you have >> suggested and get started on upgrading from 3.12.14 to 3.12.15. >> I've checked the zfs properties and all bricks have "xattr=sa" set, but >> none of them has "acltype=posixacl" set, currently the acltype property >> shows "off", if I make these changes will it apply retroactively to the >> existing data? I'm unfamiliar with what this will change so I may need to >> look into that before I proceed. >> >> >> It is safe to apply that now, any new set/get calls will then use it if >> new posixacls exist, and use older if not. ZFS is good that way. It should >> clear up your posix_acl and posix errors over time. >> >> I understand performance is going to slow down as the bricks get full, I >> am currently trying to free space and migrate data to some newer storage, I >> have fresh several hundred TB storage I just setup recently but with these >> performance issues it's really slow. I also believe there is significant >> data which has been deleted directly from the bricks in the past, so if I >> can reclaim this space in a safe manner then I will have at least around >> 10-15% free space. >> >> >> Full ZFS volumes will have a much larger impact on performance than you’d >> think, I’d prioritize this. If you have been taking zfs snapshots, consider >> deleting them to get the overall volume free space back up. And just to be >> sure it’s been said, delete from within the mounted volumes, don’t delete >> directly from the bricks (gluster will just try and heal it later, >> compounding your issues). Does not apply to deleting other data from the >> ZFS volume if it’s not part of the brick directory, of course. >> >> These servers have dual 8 core Xeon (E5-2620v4) and 512GB of RAM so >> generally they have plenty of resources available, currently only using >> around 330/512GB of memory. >> >> I will look into what your suggested settings will change, and then will >> probably go ahead with your recommendations, for our specs as stated above, >> what would you suggest for performance.io-thread-count ? >> >> >> I run single 2630v4s on my servers, which have a smaller storage >> footprint than yours. I’d go with 32 for performance.io-thread-count. >> I’d try 4 for the shd thread settings on that gear. Your memory use sounds >> fine, so no worries there. >> >> Our workload is nothing too extreme, we have a few VMs which write backup >> data to this storage nightly for our clients, our VMs don't live on this >> cluster, but just write to it. >> >> >> If they are writing compressible data, you’ll get immediate benefit by >> setting compression=lz4 on your ZFS volumes. It won’t help any old data, of >> course, but it will compress new data going forward. This is another one >> that’s safe to enable on the fly. >> >> I've been going through all of the logs I can, below are some slightly >> sanitized errors I've come across, but I'm not sure what to make of them. >> The main error I am seeing is the first one below, across several of my >> bricks, but possibly only for specific folders on the cluster, I'm not 100% >> about that yet though. >> >> [2019-04-20 05:56:59.512649] E [MSGID: 113001] >> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >> supported] >> [2019-04-20 05:59:06.084333] E [MSGID: 113001] >> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >> supported] >> [2019-04-20 05:59:43.289030] E [MSGID: 113001] >> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >> supported] >> [2019-04-20 05:59:50.582257] E [MSGID: 113001] >> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >> supported] >> [2019-04-20 06:01:42.501701] E [MSGID: 113001] >> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >> supported] >> [2019-04-20 06:01:51.665354] W [posix.c:4929:posix_getxattr] >> 0-gvAA01-posix: Extended attributes not supported (try remounting brick >> with 'user_xattr' flag) >> >> >> [2019-04-20 13:12:36.131856] E [MSGID: 113002] >> [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for >> /xxxxxxxxxxxxxxxxxxxx [Invalid argument] >> [2019-04-20 13:12:36.131959] E [MSGID: 113002] [posix.c:362:posix_lookup] >> 0-gvAA01-posix: buf->ia_gfid is null for >> /brick2/xxxxxxxxxxxxxxxxxxxx_62906_tmp [No data available] >> [2019-04-20 13:12:36.132016] E [MSGID: 115050] >> [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24274759: LOOKUP >> /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud >> Backup_clone1.vbm_62906_tmp), client: >> 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: >> gvAA01-posix [No data available] >> [2019-04-20 13:12:38.093719] E [MSGID: 115050] >> [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24276491: LOOKUP >> /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud >> Backup_clone1.vbm_62906_tmp), client: >> 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: >> gvAA01-posix [No data available] >> [2019-04-20 13:12:38.093660] E [MSGID: 113002] >> [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for >> /xxxxxxxxxxxxxxxxxxxx [Invalid argument] >> [2019-04-20 13:12:38.093696] E [MSGID: 113002] [posix.c:362:posix_lookup] >> 0-gvAA01-posix: buf->ia_gfid is null for /brick2/xxxxxxxxxxxxxxxxxxxx [No >> data available] >> >> >> posixacls should clear those up, as mentioned. >> >> >> [2019-04-20 14:25:59.654576] E [inodelk.c:404:__inode_unlock_lock] >> 0-gvAA01-locks: Matching lock not found for unlock 0-9223372036854775807, >> by 980fdbbd367f0000 on 0x7fc4f0161440 >> [2019-04-20 14:25:59.654668] E [MSGID: 115053] >> [server-rpc-fops.c:295:server_inodelk_cbk] 0-gvAA01-server: 6092928: >> INODELK /xxxxxxxxxxxxxxxxxxxx.cdr$ (25b14631-a179-4274-8243-6e272d4f2ad8), >> client: >> cb-per-worker18-53637-2019/04/19-14:25:37:927673-gvAA01-client-1-0-4, >> error-xlator: gvAA01-locks [Invalid argument] >> >> >> [2019-04-20 13:35:07.495495] E [rpcsvc.c:1364:rpcsvc_submit_generic] >> 0-rpc-service: failed to submit message (XID: 0x247c644, Program: GlusterFS >> 3.3, ProgVers: 330, Proc: 27) to rpc-transport (tcp.gvAA01-server) >> [2019-04-20 13:35:07.495619] E [server.c:195:server_submit_reply] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/debug/io-stats.so(+0x1696a) >> [0x7ff4ae6f796a] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x2d6e8) >> [0x7ff4ae2a96e8] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x928d) >> [0x7ff4ae28528d] ) 0-: Reply submission failed >> >> >> Fix the posix acls and see if these clear up over time as well, I’m >> unclear on what the overall effect of running without the posix acls will >> be to total gluster health. Your biggest problem sounds like you need to >> free up space on the volumes and get the overall volume health back up to >> par and see if that doesn’t resolve the symptoms you’re seeing. >> >> >> >> Thank you again for your assistance. It is greatly appreciated. >> >> - Patrick >> >> >> >> On Sat, Apr 20, 2019 at 10:50 PM Darrell Budic <[email protected]> >> wrote: >> >> Patrick, >> >> I would definitely upgrade your two nodes from 3.12.14 to 3.12.15. You >> also mention ZFS, and that error you show makes me think you need to check >> to be sure you have “xattr=sa” and “acltype=posixacl” set on your ZFS >> volumes. >> >> You also observed your bricks are crossing the 95% full line, ZFS >> performance will degrade significantly the closer you get to full. In my >> experience, this starts somewhere between 10% and 5% free space remaining, >> so you’re in that realm. >> >> How’s your free memory on the servers doing? Do you have your zfs arc >> cache limited to something less than all the RAM? It shares pretty well, >> but I’ve encountered situations where other things won’t try and take ram >> back properly if they think it’s in use, so ZFS never gets the opportunity >> to give it up. >> >> Since your volume is a disperse-replica, you might try tuning >> disperse.shd-max-threads, default is 1, I’d try it at 2, 4, or even more if >> the CPUs are beefy enough. And setting server.event-threads to 4 and >> client.event-threads to 8 has proven helpful in many cases. After you get >> upgraded to 3.12.15, enabling performance.stat-prefetch may help as well. I >> don’t know if it matters, but I’d also recommend resetting >> performance.least-prio-threads to the default of 1 (or try 2 or 4) and/or >> also setting performance.io-thread-count to 32 if those have beefy CPUs. >> >> Beyond those general ideas, more info about your hardware (CPU and RAM) >> and workload (VMs, direct storage for web servers or enders, etc) may net >> you some more ideas. Then you’re going to have to do more digging into >> brick logs looking for errors and/or warnings to see what’s going on. >> >> -Darrell >> >> >> On Apr 20, 2019, at 8:22 AM, Patrick Rennie <[email protected]> >> wrote: >> >> Hello Gluster Users, >> >> I am hoping someone can help me with resolving an ongoing issue I've been >> having, I'm new to mailing lists so forgive me if I have gotten anything >> wrong. We have noticed our performance deteriorating over the last few >> weeks, easily measured by trying to do an ls on one of our top-level >> folders, and timing it, which usually would take 2-5 seconds, and now takes >> up to 20 minutes, which obviously renders our cluster basically unusable. >> This has been intermittent in the past but is now almost constant and I am >> not sure how to work out the exact cause. We have noticed some errors in >> the brick logs, and have noticed that if we kill the right brick process, >> performance instantly returns back to normal, this is not always the same >> brick, but it indicates to me something in the brick processes or >> background tasks may be causing extreme latency. Due to this ability to fix >> it by killing the right brick process off, I think it's a specific file, or >> folder, or operation which may be hanging and causing the increased >> latency, but I am not sure how to work it out. One last thing to add is >> that our bricks are getting quite full (~95% full), we are trying to >> migrate data off to new storage but that is going slowly, not helped by >> this issue. I am currently trying to run a full heal as there appear to be >> many files needing healing, and I have all brick processes running so they >> have an opportunity to heal, but this means performance is very poor. It >> currently takes over 15-20 minutes to do an ls of one of our top-level >> folders, which just contains 60-80 other folders, this should take 2-5 >> seconds. This is all being checked by FUSE mount locally on the storage >> node itself, but it is the same for other clients and VMs accessing the >> cluster. Initially, it seemed our NFS mounts were not affected and operated >> at normal speed, but testing over the last day has shown that our NFS >> clients are also extremely slow, so it doesn't seem specific to FUSE as I >> first thought it might be. >> >> I am not sure how to proceed from here, I am fairly new to gluster having >> inherited this setup from my predecessor and trying to keep it going. I >> have included some info below to try and help with diagnosis, please let me >> know if any further info would be helpful. I would really appreciate any >> advice on what I could try to work out the cause. Thank you in advance for >> reading this, and any suggestions you might be able to offer. >> >> - Patrick >> >> This is an example of the main error I see in our brick logs, there have >> been others, I can post them when I see them again too: >> [2019-04-20 04:54:43.055680] E [MSGID: 113001] >> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >> /brick1/<filename> library: system.posix_acl_default [Operation not >> supported] >> [2019-04-20 05:01:29.476313] W [posix.c:4929:posix_getxattr] >> 0-gvAA01-posix: Extended attributes not supported (try remounting brick >> with 'user_xattr' flag) >> >> Our setup consists of 2 storage nodes and an arbiter node. I have noticed >> our nodes are on slightly different versions, I'm not sure if this could be >> an issue. We have 9 bricks on each node, made up of ZFS RAIDZ2 pools - >> total capacity is around 560TB. >> We have bonded 10gbps NICS on each node, and I have tested bandwidth with >> iperf and found that it's what would be expected from this config. >> Individual brick performance seems ok, I've tested several bricks using >> dd and can write a 10GB files at 1.7GB/s. >> >> # dd if=/dev/zero of=/brick1/test/test.file bs=1M count=10000 >> 10000+0 records in >> 10000+0 records out >> 10485760000 bytes (10 GB, 9.8 GiB) copied, 6.20303 s, 1.7 GB/s >> >> Node 1: >> # glusterfs --version >> glusterfs 3.12.15 >> >> Node 2: >> # glusterfs --version >> glusterfs 3.12.14 >> >> Arbiter: >> # glusterfs --version >> glusterfs 3.12.14 >> >> Here is our gluster volume status: >> >> # gluster volume status >> Status of volume: gvAA01 >> Gluster process TCP Port RDMA Port Online >> Pid >> >> ------------------------------------------------------------------------------ >> Brick 01-B:/brick1/gvAA01/brick 49152 0 Y 7219 >> Brick 02-B:/brick1/gvAA01/brick 49152 0 Y 21845 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck1 49152 0 Y >> 6931 >> Brick 01-B:/brick2/gvAA01/brick 49153 0 Y 7239 >> Brick 02-B:/brick2/gvAA01/brick 49153 0 Y 9916 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck2 49153 0 Y >> 6939 >> Brick 01-B:/brick3/gvAA01/brick 49154 0 Y 7235 >> Brick 02-B:/brick3/gvAA01/brick 49154 0 Y 21858 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck3 49154 0 Y >> 6947 >> Brick 01-B:/brick4/gvAA01/brick 49155 0 Y 31840 >> Brick 02-B:/brick4/gvAA01/brick 49155 0 Y 9933 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck4 49155 0 Y >> 6956 >> Brick 01-B:/brick5/gvAA01/brick 49156 0 Y 7233 >> Brick 02-B:/brick5/gvAA01/brick 49156 0 Y 9942 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck5 49156 0 Y >> 6964 >> Brick 01-B:/brick6/gvAA01/brick 49157 0 Y 7234 >> Brick 02-B:/brick6/gvAA01/brick 49157 0 Y 9952 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck6 49157 0 Y >> 6974 >> Brick 01-B:/brick7/gvAA01/brick 49158 0 Y 7248 >> Brick 02-B:/brick7/gvAA01/brick 49158 0 Y 9960 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck7 49158 0 Y >> 6984 >> Brick 01-B:/brick8/gvAA01/brick 49159 0 Y 7253 >> Brick 02-B:/brick8/gvAA01/brick 49159 0 Y 9970 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck8 49159 0 Y >> 6993 >> Brick 01-B:/brick9/gvAA01/brick 49160 0 Y 7245 >> Brick 02-B:/brick9/gvAA01/brick 49160 0 Y 9984 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck9 49160 0 Y >> 7001 >> NFS Server on localhost 2049 0 Y >> 17276 >> Self-heal Daemon on localhost N/A N/A Y >> 25245 >> NFS Server on 02-B 2049 0 Y 9089 >> Self-heal Daemon on 02-B N/A N/A Y 17838 >> NFS Server on 00-a 2049 0 Y 15660 >> Self-heal Daemon on 00-a N/A N/A Y 16218 >> >> Task Status of Volume gvAA01 >> >> ------------------------------------------------------------------------------ >> There are no active volume tasks >> >> And gluster volume info: >> >> # gluster volume info >> >> Volume Name: gvAA01 >> Type: Distributed-Replicate >> Volume ID: ca4ece2c-13fe-414b-856c-2878196d6118 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 9 x (2 + 1) = 27 >> Transport-type: tcp >> Bricks: >> Brick1: 01-B:/brick1/gvAA01/brick >> Brick2: 02-B:/brick1/gvAA01/brick >> Brick3: 00-A:/arbiterAA01/gvAA01/brick1 (arbiter) >> Brick4: 01-B:/brick2/gvAA01/brick >> Brick5: 02-B:/brick2/gvAA01/brick >> Brick6: 00-A:/arbiterAA01/gvAA01/brick2 (arbiter) >> Brick7: 01-B:/brick3/gvAA01/brick >> Brick8: 02-B:/brick3/gvAA01/brick >> Brick9: 00-A:/arbiterAA01/gvAA01/brick3 (arbiter) >> Brick10: 01-B:/brick4/gvAA01/brick >> Brick11: 02-B:/brick4/gvAA01/brick >> Brick12: 00-A:/arbiterAA01/gvAA01/brick4 (arbiter) >> Brick13: 01-B:/brick5/gvAA01/brick >> Brick14: 02-B:/brick5/gvAA01/brick >> Brick15: 00-A:/arbiterAA01/gvAA01/brick5 (arbiter) >> Brick16: 01-B:/brick6/gvAA01/brick >> Brick17: 02-B:/brick6/gvAA01/brick >> Brick18: 00-A:/arbiterAA01/gvAA01/brick6 (arbiter) >> Brick19: 01-B:/brick7/gvAA01/brick >> Brick20: 02-B:/brick7/gvAA01/brick >> Brick21: 00-A:/arbiterAA01/gvAA01/brick7 (arbiter) >> Brick22: 01-B:/brick8/gvAA01/brick >> Brick23: 02-B:/brick8/gvAA01/brick >> Brick24: 00-A:/arbiterAA01/gvAA01/brick8 (arbiter) >> Brick25: 01-B:/brick9/gvAA01/brick >> Brick26: 02-B:/brick9/gvAA01/brick >> Brick27: 00-A:/arbiterAA01/gvAA01/brick9 (arbiter) >> Options Reconfigured: >> cluster.shd-max-threads: 4 >> performance.least-prio-threads: 16 >> cluster.readdir-optimize: on >> performance.quick-read: off >> performance.stat-prefetch: off >> cluster.data-self-heal: on >> cluster.lookup-unhashed: auto >> cluster.lookup-optimize: on >> cluster.favorite-child-policy: mtime >> server.allow-insecure: on >> transport.address-family: inet >> client.bind-insecure: on >> cluster.entry-self-heal: off >> cluster.metadata-self-heal: off >> performance.md-cache-timeout: 600 >> cluster.self-heal-daemon: enable >> performance.readdir-ahead: on >> diagnostics.brick-log-level: INFO >> nfs.disable: off >> >> Thank you for any assistance. >> >> - Patrick >> _______________________________________________ >> Gluster-users mailing list >> [email protected] >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> >> >>
_______________________________________________ Gluster-users mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-users
