Is it possible to attach logfiles of problematic client and bricks? On Thu, Mar 1, 2018 at 3:00 AM, Ryan Lee <[email protected]> wrote:
> We've been on the Gluster 3.7 series for several years with things pretty > stable. Given that it's reached EOL, yesterday I upgraded to 3.13.2. > Every Gluster mount and server was disabled then brought back up after the > upgrade, changing the op-version to 31302 and then trying it all out. > > It went poorly. Every sizable read and write (100's MB) lead to > 'Transport endpoint not connected' errors on the command line and immediate > unavailability of the mount. After unsuccessfully trying to search for > similar problems with solutions, I ended up downgrading to 3.12.6 and > changing the op-version to 31202. That brought us back to usability with > the majority of those operations succeeding enough to consider it online, > but there are still occasional mount disconnects that we never saw with 3.7 > - about 6 in the past 18 hours. It seems these disconnects would never > come back, either, unless manually re-mounted. Manually remounting > reconnects immediately. They only disconnect the affected client, though > some simultaneous disconnects have occurred due to simultaneous activity. > The lower-level log info seems to indicate a socket problem, potentially > broken on the client side based on timing (but the timing is narrow, and I > won't claim the clocks are that well synchronized across all our servers). > The client and one server claim a socket polling error with no data > available, and the other server claims a writev error. This seems to lead > the client to the 'all subvolumes are down' state, even though all other > clients are still connected. Has anybody run into this? Did I miss > anything moving so many versions ahead? > > I've included the output of volume info and some excerpts from the logs. > We have two servers running glusterd and two replica volumes with a brick > on each server. Both experience disconnects; there are about 10 clients > for each, with one using both. We use SSL over internal IPv4. Names in all > caps were replaced, as were IP addresses. > > Let me know if there's anything else I can provide. > > % gluster v info VOL > Volume Name: VOL > Type: Replicate > Volume ID: 3207155f-02c6-447a-96c4-5897917345e0 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: SERVER1:/glusterfs/VOL-brick1/data > Brick2: SERVER2:/glusterfs/VOL-brick2/data > Options Reconfigured: > config.transport: tcp > features.selinux: off > transport.address-family: inet > nfs.disable: on > client.ssl: on > performance.readdir-ahead: on > auth.ssl-allow: [NAMES, including CLIENT] > server.ssl: on > ssl.certificate-depth: 3 > > Log excerpts (there was nothing related in glusterd.log): > > CLIENT:/var/log/glusterfs/mnt-VOL.log > [2018-02-28 19:35:58.378334] E [socket.c:2648:socket_poller] > 0-VOL-client-1: socket_poller SERVER2:49153 failed (No data available) > [2018-02-28 19:35:58.477154] E [MSGID: 108006] > [afr-common.c:5164:__afr_handle_child_down_event] 0-VOL-replicate-0: All > subvolumes are down. Going offline until atleast one of them comes back up. > [2018-02-28 19:35:58.486146] E [MSGID: 101046] > [dht-common.c:1501:dht_lookup_dir_cbk] 0-VOL-dht: dict is null <67 times> > <lots of saved_frames_unwind messages> > [2018-02-28 19:38:06.428607] E [socket.c:2648:socket_poller] > 0-VOL-client-1: socket_poller SERVER2:24007 failed (No data available) > [2018-02-28 19:40:12.548650] E [socket.c:2648:socket_poller] > 0-VOL-client-1: socket_poller SERVER2:24007 failed (No data available) > > <manual umount / mount> > > > SERVER2:/var/log/glusterfs/bricks/VOL-brick2.log > [2018-02-28 19:35:58.379953] E [socket.c:2632:socket_poller] > 0-tcp.VOL-server: poll error on socket > [2018-02-28 19:35:58.380530] I [MSGID: 115036] > [server.c:527:server_rpc_notify] 0-VOL-server: disconnecting connection > from CLIENT-30688-2018/02/28-03:11:39:784734-VOL-client-1-0-0 > [2018-02-28 19:35:58.380932] I [socket.c:3672:socket_submit_reply] > 0-tcp.VOL-server: not connected (priv->connected = -1) > [2018-02-28 19:35:58.380960] E [rpcsvc.c:1364:rpcsvc_submit_generic] > 0-rpc-service: failed to submit message (XID: 0xa4e, Program: GlusterFS > 3.3, ProgVers: 330, Proc: 25) to rpc-transport (tcp.uploads-server) > [2018-02-28 19:35:58.381124] E [server.c:195:server_submit_reply] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.6/xlator/debug/io-stats.so(+0x1ae6a) > [0x7f97bd37ee6a] -->/usr/lib/x86_64-linux-gnu/g > lusterfs/3.12.6/xlator/protocol/server.so(+0x1d4c8) [0x7f97bcf1f4c8] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.6/xlator/protocol/server.so(+0x8bd5) > [0x7f97bcf0abd5] ) 0-: Reply submission failed > [2018-02-28 19:35:58.381196] I [MSGID: 101055] > [client_t.c:443:gf_client_unref] 0-VOL-server: Shutting down connection > CLIENT-30688-2018/02/28-03:11:39:784734-VOL-client-1-0-0 > [2018-02-28 19:40:58.351350] I [addr.c:55:compare_addr_and_update] > 0-/glusterfs/uploads-brick2/data: allowed = "*", received addr = "CLIENT" > [2018-02-28 19:40:58.351684] I [login.c:34:gf_auth] 0-auth/login: > connecting user name: CLIENT > > SERVER1:/var/log/glusterfs/bricks/VOL-brick1.log > [2018-02-28 19:35:58.509713] W [socket.c:593:__socket_rwv] > 0-tcp.VOL-server: writev on CLIENT:49150 failed (No data available) > [2018-02-28 19:35:58.509839] E [socket.c:2632:socket_poller] > 0-tcp.VOL-server: poll error on socket > [2018-02-28 19:35:58.509957] I [MSGID: 115036] > [server.c:527:server_rpc_notify] 0-VOL-server: disconnecting connection > from CLIENT-30688-2018/02/28-03:11:39:784734-VOL-client-0-0-0 > [2018-02-28 19:35:58.510258] I [socket.c:3672:socket_submit_reply] > 0-tcp.VOL-server: not connected (priv->connected = -1) > [2018-02-28 19:35:58.510281] E [rpcsvc.c:1364:rpcsvc_submit_generic] > 0-rpc-service: failed to submit message (XID: 0x4b3f, Program: GlusterFS > 3.3, ProgVers: 330, Proc: 25) to rpc-transport (tcp.VOL-server) > [2018-02-28 19:35:58.510357] E [server.c:195:server_submit_reply] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.6/xlator/debug/io-stats.so(+0x1ae6a) > [0x7f85bb7a8e6a] -->/usr/lib/x86_64-linux-gnu/g > lusterfs/3.12.6/xlator/protocol/server.so(+0x1d4c8) [0x7f85bb3494c8] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.6/xlator/protocol/server.so(+0x8bd5) > [0x7f85bb334bd5] ) 0-: Reply submission failed > [2018-02-28 19:35:58.510409] I [MSGID: 101055] > [client_t.c:443:gf_client_unref] 0-VOL-server: Shutting down connection > CLIENT-30688-2018/02/28-03:11:39:784734-VOL-client-0-0-0 > [2018-02-28 19:40:58.364068] I [addr.c:55:compare_addr_and_update] > 0-/glusterfs/uploads-brick1/data: allowed = "*", received addr = "CLIENT" > [2018-02-28 19:40:58.364137] I [login.c:34:gf_auth] 0-auth/login: > connecting user name: CLIENT > _______________________________________________ > Gluster-users mailing list > [email protected] > http://lists.gluster.org/mailman/listinfo/gluster-users >
_______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
