I apologise for this being posted twice - I'm not sure if that was user error 
or a bug in the mailing list, but the list wasn't showing my post after quite 
some time so I sent a second email which near immediately showed up - that's 
mailing lists I guess...

Anyway, if anyone has any input, advice or abuse I'm welcome any input!

--
Sam McLeod
https://smcleod.net
https://twitter.com/s_mcleod

> On 3 Sep 2018, at 1:20 pm, Sam McLeod <[email protected]> wrote:
> 
> We've got an odd problem where clients are blocked from writing to Gluster 
> volumes until the first node of the Gluster cluster is rebooted.
> 
> I suspect I've either configured something incorrectly with the arbiter / 
> replica configuration of the volumes, or there is some sort of bug in the 
> gluster client-server connection that we're triggering.
> 
> I was wondering if anyone has seen this or could point me in the right 
> direction?
> 
> 
> Environment:
> Typology: 3 node cluster, replica 2, arbiter 1 (third node is metadata only).
> Version: Client and Servers both running 4.1.3, both on CentOS 7, kernel 
> 4.18.x, (Xen) VMs with relatively fast networked SSD storage backing them, 
> XFS.
> Client: Native Gluster FUSE client mounting via the kubernetes provider
> 
> Problem:
> Seemingly randomly some clients will be blocked / are unable to write to what 
> should be a highly available gluster volume.
> The client gluster logs show it failing to do new file operations across 
> various volumes and all three nodes of the gluster.
> The server gluster (or OS) logs do not show any warnings or errors.
> The client recovers and is able to write to volumes again after the first 
> node of the gluster cluster is rebooted.
> Until the first node of the gluster cluster is rebooted, the client fails to 
> write to the volume that is (or should be) available on the second node (a 
> replica) and third node (an arbiter only node).
> 
> What 'fixes' the issue:
> Although the clients (kubernetes hosts) connect to all 3 nodes of the Gluster 
> cluster - restarting the first gluster node always unblocks the IO and allows 
> the client to continue writing.
> Stopping and starting the glusterd service on the gluster server is not 
> enough to fix the issue, nor is restarting its networking.
> This suggests to me that the volume is unavailable for writing for some 
> reason and restarting the first node in the cluster either clears some sort 
> of TCP sessions between the client-server or between the server-server 
> replication.
> 
> Expected behaviour:
> 
> If the first gluster node / server had failed or was blocked from performing 
> operations for some reason (which it doesn't seem it is), I'd expect the 
> clients to access data from the second gluster node and write metadata to the 
> third gluster node as well as it's an arbiter / metadata only node.
> If for some reason the a gluster node was not able to serve connections to 
> clients, I'd expect to see errors in the volume, glusterd or brick log files 
> (there are none on the first gluster node).
> If the first gluster node was for some reason blocking IO on a volume, I'd 
> expect that node either to show as unhealthy or unavailable in the gluster 
> peer status or gluster volume status.
> 
> 
> Client gluster errors:
> 
> staging_static in this example is a volume name.
> You can see the client trying to connect to the second and third nodes of the 
> gluster cluster and failing (unsure as to why?)
> The server side logs on the first gluster node do not show any errors or 
> problems, but the second / third node show errors in the glusterd.log when 
> trying to 'unlock' the 0-management volume on the first node.
> 
> 
> On a gluster client (a kubernetes host using the kubernetes connector which 
> uses the native fuse client) when its blocked from writing but the gluster 
> appears healthy (other than the errors mentioned later):
> 
> [2018-09-02 15:33:22.750874] E [rpc-clnt.c:184:call_bail] 
> 0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) 
> op(INODELK(29)) xid = 0x1cce sent = 2018-09-02 15:03:22.417773. timeout = 
> 1800 for <ip of third gluster node>:49154
> [2018-09-02 15:33:22.750989] E [MSGID: 114031] 
> [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: 
> remote operation failed [Transport endpoint is not connected]
> [2018-09-02 16:03:23.097905] E [rpc-clnt.c:184:call_bail] 
> 0-staging_static-client-1: bailing out frame type(GlusterFS 4.x v1) 
> op(INODELK(29)) xid = 0x2e21 sent = 2018-09-02 15:33:22.765751. timeout = 
> 1800 for <ip of second gluster node>:49154
> [2018-09-02 16:03:23.097988] E [MSGID: 114031] 
> [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-1: 
> remote operation failed [Transport endpoint is not connected]
> [2018-09-02 16:33:23.439172] E [rpc-clnt.c:184:call_bail] 
> 0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) 
> op(INODELK(29)) xid = 0x1d4b sent = 2018-09-02 16:03:23.098133. timeout = 
> 1800 for <ip of third gluster node>:49154
> [2018-09-02 16:33:23.439282] E [MSGID: 114031] 
> [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: 
> remote operation failed [Transport endpoint is not connected]
> [2018-09-02 17:03:23.786858] E [rpc-clnt.c:184:call_bail] 
> 0-staging_static-client-1: bailing out frame type(GlusterFS 4.x v1) 
> op(INODELK(29)) xid = 0x2ee7 sent = 2018-09-02 16:33:23.455171. timeout = 
> 1800 for <ip of second gluster node>:49154
> [2018-09-02 17:03:23.786971] E [MSGID: 114031] 
> [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-1: 
> remote operation failed [Transport endpoint is not connected]
> [2018-09-02 17:33:24.160607] E [rpc-clnt.c:184:call_bail] 
> 0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) 
> op(INODELK(29)) xid = 0x1dc8 sent = 2018-09-02 17:03:23.787120. timeout = 
> 1800 for <ip of third gluster node>:49154
> [2018-09-02 17:33:24.160720] E [MSGID: 114031] 
> [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: 
> remote operation failed [Transport endpoint is not connected]
> [2018-09-02 18:03:24.505092] E [rpc-clnt.c:184:call_bail] 
> 0-staging_static-client-1: bailing out frame type(GlusterFS 4.x v1) 
> op(INODELK(29)) xid = 0x2faf sent = 2018-09-02 17:33:24.173153. timeout = 
> 1800 for <ip of second gluster node>:49154
> [2018-09-02 18:03:24.505185] E [MSGID: 114031] 
> [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-1: 
> remote operation failed [Transport endpoint is not connected]
> [2018-09-02 18:33:24.841248] E [rpc-clnt.c:184:call_bail] 
> 0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) 
> op(INODELK(29)) xid = 0x1e45 sent = 2018-09-02 18:03:24.505328. timeout = 
> 1800 for <ip of third gluster node>:49154
> [2018-09-02 18:33:24.841311] E [MSGID: 114031] 
> [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: 
> remote operation failed [Transport endpoint is not connected]
> [2018-09-02 19:03:25.204711] E [rpc-clnt.c:184:call_bail] 
> 0-staging_static-client-1: bailing out frame type(GlusterFS 4.x v1) 
> op(INODELK(29)) xid = 0x3074 sent = 2018-09-02 18:33:24.855372. timeout = 
> 1800 for <ip of second gluster node>:49154
> [2018-09-02 19:03:25.204784] E [MSGID: 114031] 
> [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-1: 
> remote operation failed [Transport endpoint is not connected]
> [2018-09-02 19:33:25.533545] E [rpc-clnt.c:184:call_bail] 
> 0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) 
> op(INODELK(29)) xid = 0x1ec2 sent = 2018-09-02 19:03:25.204977. timeout = 
> 1800 for <ip of third gluster node>:49154
> [2018-09-02 19:33:25.533611] E [MSGID: 114031] 
> [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: 
> remote operation failed [Transport endpoint is not connected]
> [2018-09-02 20:03:25.877020] E [rpc-clnt.c:184:call_bail] 
> 0-staging_static-client-1: bailing out frame type(GlusterFS 4.x v1) 
> op(INODELK(29)) xid = 0x3138 sent = 2018-09-02 19:33:25.545921. timeout = 
> 1800 for <ip of second gluster node>:49154
> [2018-09-02 20:03:25.877098] E [MSGID: 114031] 
> [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-1: 
> remote operation failed [Transport endpoint is not connected]
> [2018-09-02 20:33:26.217858] E [rpc-clnt.c:184:call_bail] 
> 0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) 
> op(INODELK(29)) xid = 0x1f3e sent = 2018-09-02 20:03:25.877264. timeout = 
> 1800 for <ip of third gluster node>:49154
> [2018-09-02 20:33:26.217973] E [MSGID: 114031] 
> [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: 
> remote operation failed [Transport endpoint is not connected]
> [2018-09-02 21:03:26.588237] E [rpc-clnt.c:184:call_bail] 
> 0-staging_static-client-1: bailing out frame type(GlusterFS 4.x v1) 
> op(INODELK(29)) xid = 0x31ff sent = 2018-09-02 20:33:26.233010. timeout = 
> 1800 for <ip of second gluster node>:49154
> [2018-09-02 21:03:26.588316] E [MSGID: 114031] 
> [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-1: 
> remote operation failed [Transport endpoint is not connected]
> [2018-09-02 21:33:26.912334] E [rpc-clnt.c:184:call_bail] 
> 0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) 
> op(INODELK(29)) xid = 0x1fbb sent = 2018-09-02 21:03:26.588456. timeout = 
> 1800 for <ip of third gluster node>:49154
> [2018-09-02 21:33:26.912449] E [MSGID: 114031] 
> [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: 
> remote operation failed [Transport endpoint is not connected]
> [2018-09-02 22:03:37.258915] E [rpc-clnt.c:184:call_bail] 
> 0-staging_static-client-1: bailing out frame type(GlusterFS 4.x v1) 
> op(INODELK(29)) xid = 0x32c5 sent = 2018-09-02 21:33:32.091009. timeout = 
> 1800 for <ip of second gluster node>:49154
> [2018-09-02 22:03:37.259000] E [MSGID: 114031] 
> [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-1: 
> remote operation failed [Transport endpoint is not connected]
> [2018-09-02 22:33:37.615497] E [rpc-clnt.c:184:call_bail] 
> 0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) 
> op(INODELK(29)) xid = 0x2039 sent = 2018-09-02 22:03:37.259147. timeout = 
> 1800 for <ip of third gluster node>:49154
> [2018-09-02 22:33:37.615574] E [MSGID: 114031] 
> [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: 
> remote operation failed [Transport endpoint is not connected]
> [2018-09-02 23:03:37.940969] E [rpc-clnt.c:184:call_bail] 
> 0-staging_static-client-1: bailing out frame type(GlusterFS 4.x v1) 
> op(INODELK(29)) xid = 0x3386 sent = 2018-09-02 22:33:37.629655. timeout = 
> 1800 for <ip of second gluster node>:49154
> [2018-09-02 23:03:37.941049] E [MSGID: 114031] 
> [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-1: 
> remote operation failed [Transport endpoint is not connected]
> [2018-09-02 23:33:38.270998] E [rpc-clnt.c:184:call_bail] 
> 0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) 
> op(INODELK(29)) xid = 0x20b5 sent = 2018-09-02 23:03:37.941199. timeout = 
> 1800 for <ip of third gluster node>:49154
> [2018-09-02 23:33:38.271078] E [MSGID: 114031] 
> [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: 
> remote operation failed [Transport endpoint is not connected]
> [2018-09-03 00:03:38.607186] E [rpc-clnt.c:184:call_bail] 
> 0-staging_static-client-1: bailing out frame type(GlusterFS 4.x v1) 
> op(INODELK(29)) xid = 0x3447 sent = 2018-09-02 23:33:38.285899. timeout = 
> 1800 for <ip of second gluster node>:49154
> [2018-09-03 00:03:38.607263] E [MSGID: 114031] 
> [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-1: 
> remote operation failed [Transport endpoint is not connected]
> [2018-09-03 00:33:38.934385] E [rpc-clnt.c:184:call_bail] 
> 0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) 
> op(INODELK(29)) xid = 0x2131 sent = 2018-09-03 00:03:38.607410. timeout = 
> 1800 for <ip of third gluster node>:49154
> [2018-09-03 00:33:38.934479] E [MSGID: 114031] 
> [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: 
> remote operation failed [Transport endpoint is not connected]
> [2018-09-03 01:03:39.256842] E [rpc-clnt.c:184:call_bail] 
> 0-staging_static-client-1: bailing out frame type(GlusterFS 4.x v1) 
> op(INODELK(29)) xid = 0x350c sent = 2018-09-03 00:33:38.948570. timeout = 
> 1800 for <ip of second gluster node>:49154
> [2018-09-03 01:03:39.256972] E [MSGID: 114031] 
> [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-1: 
> remote operation failed [Transport endpoint is not connected]
> [2018-09-03 01:33:39.614402] E [rpc-clnt.c:184:call_bail] 
> 0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) 
> op(INODELK(29)) xid = 0x21ae sent = 2018-09-03 01:03:39.258166. timeout = 
> 1800 for <ip of third gluster node>:49154
> [2018-09-03 01:33:39.614483] E [MSGID: 114031] 
> [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: 
> remote operation failed [Transport endpoint is not connected]
> 
> 
> On the second gluster server:
> 
> 
> We are seeing the following error in the glusterd.log file when the client is 
> blocked from writing the volume, I think this is probably the most important 
> information about the error and suggests a problem with the first node but 
> doesn't explain the client behaviour:
> 
> [2018-09-02 08:31:03.902272] E [MSGID: 106115] 
> [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Unlocking 
> failed on <FQDN of the first gluster node>. Please check log file for details.
> [2018-09-02 08:31:03.902477] E [MSGID: 106151] 
> [glusterd-syncop.c:1640:gd_unlock_op_phase] 0-management: Failed to unlock on 
> some peer(s)
> 
> Note in the above error:
> 
> 1. I'm not sure which log to check (there doesn't seem to be a management 
> brick / brick log)?
> 2. If there's a problem with the first node, why isn't it rejected from the 
> gluster / taken offline / the health of the peers or volume list degraded?
> 3. Why does the client fail to write to the volume rather than (I'm assuming) 
> trying the second (or third I guess) node to write to the volume?
> 
> 
> We are also seeing the following errors repeated a lot in the logs, both when 
> the volumes are working and when there's an issue in the brick log 
> (/var/log/glusterfs/bricks/mnt-gluster-storage-staging_static.log):
> 
> [2018-09-03 01:58:35.128923] E [server.c:137:server_submit_reply] 
> (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x1fd14) 
> [0x7f8470319d14] 
> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x5f24a) 
> [0x7f846bdde24a] 
> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce) 
> [0x7f846bd89fce] ) 0-: Reply submission failed
> [2018-09-03 01:58:35.128957] E [rpcsvc.c:1378:rpcsvc_submit_generic] 
> 0-rpc-service: failed to submit message (XID: 0x3d60, Program: GlusterFS 4.x 
> v1, ProgVers: 400, Proc: 29) to rpc-transport (tcp.staging_static-server)
> [2018-09-03 01:58:35.128983] E [server.c:137:server_submit_reply] 
> (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x1fd14) 
> [0x7f8470319d14] 
> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x5f24a) 
> [0x7f846bdde24a] 
> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce) 
> [0x7f846bd89fce] ) 0-: Reply submission failed
> [2018-09-03 01:58:35.129016] E [rpcsvc.c:1378:rpcsvc_submit_generic] 
> 0-rpc-service: failed to submit message (XID: 0x3e2a, Program: GlusterFS 4.x 
> v1, ProgVers: 400, Proc: 29) to rpc-transport (tcp.staging_static-server)
> [2018-09-03 01:58:35.129042] E [server.c:137:server_submit_reply] 
> (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x1fd14) 
> [0x7f8470319d14] 
> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x5f24a) 
> [0x7f846bdde24a] 
> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce) 
> [0x7f846bd89fce] ) 0-: Reply submission failed
> [2018-09-03 01:58:35.129077] E [rpcsvc.c:1378:rpcsvc_submit_generic] 
> 0-rpc-service: failed to submit message (XID: 0x3ef6, Program: GlusterFS 4.x 
> v1, ProgVers: 400, Proc: 29) to rpc-transport (tcp.staging_static-server)
> [2018-09-03 01:58:35.129149] E [server.c:137:server_submit_reply] 
> (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x1fd14) 
> [0x7f8470319d14] 
> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x5f24a) 
> [0x7f846bdde24a] 
> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce) 
> [0x7f846bd89fce] ) 0-: Reply submission failed
> [2018-09-03 01:58:35.129191] E [rpcsvc.c:1378:rpcsvc_submit_generic] 
> 0-rpc-service: failed to submit message (XID: 0x3fc6, Program: GlusterFS 4.x 
> v1, ProgVers: 400, Proc: 29) to rpc-transport (tcp.staging_static-server)
> [2018-09-03 01:58:35.129219] E [server.c:137:server_submit_reply] 
> (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x1fd14) 
> [0x7f8470319d14] 
> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x5f24a) 
> [0x7f846bdde24a] 
> -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce) 
> [0x7f846bd89fce] ) 0-: Reply submission failed
> 
> 
> 
> Gluster volume information:
> 
> 
> # gluster volume info staging_static
> 
> Volume Name: staging_static
> Type: Replicate
> Volume ID: 7f3b8e91-afea-4fc6-be83-3399a089b6f3
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: <first gluster node.fqdn>:/mnt/gluster-storage/staging_static
> Brick2: <second gluster node.fqdn>:/mnt/gluster-storage/staging_static
> Brick3: <third gluster node.fqdn>:/mnt/gluster-storage/staging_static 
> (arbiter)
> Options Reconfigured:
> storage.fips-mode-rchecksum: true
> cluster.self-heal-window-size: 16
> cluster.shd-wait-qlength: 4096
> cluster.shd-max-threads: 8
> performance.cache-min-file-size: 2KB
> performance.rda-cache-limit: 1GB
> network.inode-lru-limit: 50000
> server.outstanding-rpc-limit: 256
> transport.listen-backlog: 2048
> performance.write-behind-window-size: 512MB
> performance.stat-prefetch: true
> performance.io <http://performance.io/>-thread-count: 16
> performance.client-io-threads: true
> performance.cache-size: 1GB
> performance.cache-refresh-timeout: 60
> performance.cache-invalidation: true
> cluster.use-compound-fops: true
> cluster.readdir-optimize: true
> cluster.lookup-optimize: true
> cluster.favorite-child-policy: size
> cluster.eager-lock: true
> client.event-threads: 4
> nfs.disable: on
> transport.address-family: inet
> diagnostics.brick-log-level: ERROR
> diagnostics.client-log-level: ERROR
> features.cache-invalidation-timeout: 300
> features.cache-invalidation: true
> network.ping-timeout: 15
> performance.cache-max-file-size: 3MB
> performance.md-cache-timeout: 300
> server.event-threads: 4
> 
> Thanks in advance,
> 
> 
> --
> Sam McLeod (protoporpoise on IRC)
> https://smcleod.net <https://smcleod.net/>
> https://twitter.com/s_mcleod
> 
> Words are my own opinions and do not necessarily represent those of my 
> employer or partners.
> 
> _______________________________________________
> Gluster-users mailing list
> [email protected]
> https://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to