Hi Jeevan, You might be hitting https://bugzilla.redhat.com/show_bug.cgi?id=1635820
Were any of the volumes in "Created" state, when the peer reject issue is seen? Thanks, Sanju On Mon, Nov 26, 2018 at 9:35 AM Jeevan Patnaik <[email protected]> wrote: > Hi Atin, > > Thanks for the details. I think the issue is with few of the nodes which > aren't serving any bricks in rejected state. When I remove them from pool > and stop glusterfs in those nodes, everything seems normal. > > We keep those nodes as spares, but have glusterd runnin. coz in our > configuration, servers are also clients and we are using gluster NFS > without failover for mounts and to localize the impact if a node goes down, > we use localhost as the nfs server on each node. > I.e., > mount -t nfs localhost:/volume /mointpoint > > So, glusterfs should be running in these spare nodes. Now is this okay to > keep those nodes in the pool? Will they go to rejected state again and > cause transaction locks. Why aren't they in sync though they're part of the > pool. > > Regards, > Jeevan. > > On Mon, Nov 26, 2018, 8:22 AM Atin Mukherjee <[email protected] wrote: > >> >> >> On Mon, Nov 26, 2018 at 8:21 AM Atin Mukherjee <[email protected]> >> wrote: >> >>> >>> >>> On Sun, Nov 25, 2018 at 8:40 PM Jeevan Patnaik <[email protected]> >>> wrote: >>> >>>> Hi, >>>> >>>> I am getting output Another transaction is in progress with few gluster >>>> volume commands including stop command. And with gluster volume status >>>> command, it's just hung and fails with timeout error. >>>> >>> >>> This is primarily because of not allowing glusterd to complete it's >>> handshake with others when concurrent restart of glusterd services are done >>> (as I could understand from your previous email in the list). With GlusterD >>> (read as GD1) this is a current challenge w.r.t it's design where due to >>> its N X N handshaking mechanism at the restart sequence to bring all the >>> configuration data into inconsistent what we've seen is the overall >>> recovery time of the cluster can take very long if N is on the higher side >>> (in your case N = 72 which is certainly high) and hence the recommendation >>> is not to restart the glusterd services concurrently and wait for the >>> handshaking to complete. >>> >> >> Forgot to mention that GlusterD2 ( https://github.com/gluster/glusterd2) >> which is in development phase addresses this design problem. >> >> >>> >>>> So, I want to find out which transaction is hung and how to know this? >>>> I ran volume statedump command, but didn't wait till it's completed to >>>> check if it's hung or giving any resut, as it is also taking time. >>>> >>> >>> kill -SIGUSR1 $(pidof glusterd) should get you a glusterd statedump file >>> in /var/run/gluster which can point to a backtrace dump at the bottom to >>> understand which transaction is currently in progress. In case this >>> transaction is queued up for more than 180 seconds (which is not usual) the >>> unlock timer kicks out such locks. >>> >>> >>>> Please help me with this.. I'm struggling with these gluster timeout >>>> errors :( >>>> >>>> Besides, I have also tuned >>>> transport.listen-backlog gluster to 200 and following kernel parameters >>>> to avoid syn overflow rejects: >>>> net.core.somaxconn = 1024 >>>> net.ipv4.tcp_max_syn_backlog = 20480 >>>> >>>> Regards, >>>> Jeevan. >>>> >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> [email protected] >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> _______________________________________________ > Gluster-users mailing list > [email protected] > https://lists.gluster.org/mailman/listinfo/gluster-users -- Thanks, Sanju
_______________________________________________ Gluster-users mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-users
