Can you share the glusterd logs from the three nodes ?
Rafi KC On 04/28/2017 02:34 PM, Seva Gluschenko wrote: > Dear Community, > > > I call for your wisdom, as it appears that googling for keywords doesn't help > much. > > I have a glusterfs volume with replica count 2, and I tried to perform the > online upgrade procedure described in the docs > (http://gluster.readthedocs.io/en/latest/Upgrade-Guide/upgrade_to_3.10/). It > all went almost fine when I'd done with the first replica, the only problem > was the self-heal procedure that refused to complete until I commented out > all IPv6 entries in the /etc/hosts. > > So far, being sure that it all should work on the 2nd replica pretty the same > as it was on the 1st one, I had proceeded with the upgrade on the replica 2. > All of a sudden, it told me that it doesn't see the first replica at all. The > state before upgrade was: > > sst2# gluster volume status > Status of volume: gv0 > Gluster process TCP Port RDMA Port Online Pid > ------------------------------------------------------------------------------ > Brick sst0:/var/glusterfs 49152 0 Y 3482 > Brick sst2:/var/glusterfs 49152 0 Y 29863 > NFS Server on localhost 2049 0 Y 25175 > Self-heal Daemon on localhost N/A N/A Y 25283 > NFS Server on sst0 N/A N/A N N/A > Self-heal Daemon on sst0 N/A N/A Y 4827 > NFS Server on sst1 N/A N/A N N/A > Self-heal Daemon on sst1 N/A N/A Y 15009 > > Task Status of Volume gv0 > ------------------------------------------------------------------------------ > There are no active volume tasks > > sst2# gluster peer status > Number of Peers: 2 > > Hostname: sst0 > Uuid: 26b35bd7-ad7e-4a25-a3f9-70002771e1fc > State: Peer in Cluster (Connected) > > Hostname: sst1 > Uuid: 5a2198de-f536-4328-a278-7f746f276e35 > State: Sent and Received peer request (Connected) > > sst2# gluster volume heal gv0 info > Brick sst0:/var/glusterfs > Number of entries: 0 > > Brick sst2:/var/glusterfs > Number of entries: 0 > > > After upgrade, it looked like this: > > sst2# gluster volume status > Status of volume: gv0 > Gluster process TCP Port RDMA Port Online Pid > ------------------------------------------------------------------------------ > Brick sst2:/var/glusterfs N/A N/A N N/A > NFS Server on localhost N/A N/A N N/A > NFS Server on localhost N/A N/A N N/A > > Task Status of Volume gv0 > ------------------------------------------------------------------------------ > There are no active volume tasks > > sst2# gluster peer status > Number of Peers: 2 > > Hostname: sst1 > Uuid: 5a2198de-f536-4328-a278-7f746f276e35 > State: Sent and Received peer request (Connected) > > Hostname: sst0 > Uuid: 26b35bd7-ad7e-4a25-a3f9-70002771e1fc > State: Peer Rejected (Connected) > > > My biggest fault probably, at that point I googled and found this article > https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Resolving%20Peer%20Rejected/ > -- and followed its advice, removing at sst2 all the /var/lib/glusterd > contents except the glusterd.info file. As the result, the node, predictably, > lost all information about the volume. > > sst2# gluster volume status > No volumes present > > sst2# gluster peer status > Number of Peers: 2 > > Hostname: sst0 > Uuid: 26b35bd7-ad7e-4a25-a3f9-70002771e1fc > State: Accepted peer request (Connected) > > Hostname: sst1 > Uuid: 5a2198de-f536-4328-a278-7f746f276e35 > State: Accepted peer request (Connected) > > Okay, I thought, this is might be a high time to re-add the brick. Not that > easy, Jack: > > sst0# gluster volume add-brick gv0 replica 2 'sst2:/var/glusterfs' > volume add-brick: failed: Operation failed > > The reason appeared to be natural: sst0 still knows that there was the > replica on sst2. What should I do then? At this point, I tried to recover the > volume information on sst2 by putting it offline and copying all the volume > info from the sst0. Of course it wasn't enough to just copy as is, I modified > /var/lib/glusterd/vols/gv0/sst*\:-var-glusterfs, setting listen-port=0 for > the remote brick (sst0) and listen-port=49152 for the local brick (sst2). It > didn't help much, unfortunately. The final state I've reached is as follows: > > sst2# gluster peer status > Number of Peers: 2 > > Hostname: sst1 > Uuid: 5a2198de-f536-4328-a278-7f746f276e35 > State: Sent and Received peer request (Connected) > > Hostname: sst0 > Uuid: 26b35bd7-ad7e-4a25-a3f9-70002771e1fc > State: Sent and Received peer request (Connected) > > sst2# gluster volume info > > Volume Name: gv0 > Type: Replicate > Volume ID: dd4996c0-04e6-4f9b-a04e-73279c4f112b > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: sst0:/var/glusterfs > Brick2: sst2:/var/glusterfs > Options Reconfigured: > cluster.self-heal-daemon: enable > performance.readdir-ahead: on > storage.owner-uid: 1000 > storage.owner-gid: 1000 > > sst2# gluster volume status > Status of volume: gv0 > Gluster process TCP Port RDMA Port Online Pid > ------------------------------------------------------------------------------ > Brick sst2:/var/glusterfs N/A N/A N N/A > NFS Server on localhost N/A N/A N N/A > NFS Server on localhost N/A N/A N N/A > > Task Status of Volume gv0 > ------------------------------------------------------------------------------ > There are no active volume tasks > > > Meanwhile, on sst0: > > sst0# gluster volume info > > Volume Name: gv0 > Type: Replicate > Volume ID: dd4996c0-04e6-4f9b-a04e-73279c4f112b > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: sst0:/var/glusterfs > Brick2: sst2:/var/glusterfs > Options Reconfigured: > storage.owner-gid: 1000 > storage.owner-uid: 1000 > performance.readdir-ahead: on > cluster.self-heal-daemon: enable > > sst0 ~ # gluster volume status > Status of volume: gv0 > Gluster process TCP Port RDMA Port Online Pid > ------------------------------------------------------------------------------ > Brick sst0:/var/glusterfs 49152 0 Y 31263 > NFS Server on localhost N/A N/A N N/A > Self-heal Daemon on localhost N/A N/A Y 31254 > > Task Status of Volume gv0 > ------------------------------------------------------------------------------ > There are no active volume tasks > > > Any ideas how to bring the sst2 back to normal are appreciated. As a last > resort solution, I can schedule the downtime, backup data, kill the volume > and start all over, but I would like to know if there is a shorter path. > Thank you very much in advance. > > -- > Best Regards, > > Seva Gluschenko > _______________________________________________ > Gluster-users mailing list > [email protected] > http://lists.gluster.org/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
