Thanks for that advice. It worked. Setting the UUID in glusterd.info was the bit I missed.
It seemed to work without the setfattr step in my particular case. On Thu, Sep 22, 2016 at 11:05 AM, Serkan Çoban <[email protected]> wrote: > Here are the steps for replacing a failed node: > > > 1- In one of the other servers run "grep thaila > /var/lib/glusterd/peers/* | cut -d: -f1 | cut -d/ -f6" and note the > UUID > 2- stop glusterd on failed server and add "UUID=uuid_from_previous > step" to /var/lib/glusterd/glusterd.info and start glusterd > 3- run "gluster peer probe calliope" > 4- restart glusterd > 5- now gluster peer status should show all the peers. if not probe > them manually as above. > 6-for all the bricks run the command "setfattr -n > trusted.glusterfs.volume-id -v 0x$(grep volume-id > /var/lib/glusterd/vols/vol_name/info | cut -d= -f2 | sed 's/-//g') > brick_name" > 7 restart glusterd and everythimg should be fine. > > I think I read the steps from this link: > https://support.rackspace.com/how-to/recover-from-a-failed- > server-in-a-glusterfs-array/ > Look to the "keep the ip address" part. > > > On Thu, Sep 22, 2016 at 5:16 PM, Tony Schreiner > <[email protected]> wrote: > > I set uo a dispersed volume with 1 x (3 + 1) nodes ( i do know that 3+1 > is > > not optimal). > > Originally created in version 3.7 but recently upgraded without issue to > > 3.8. > > > > # gluster vol info > > Volume Name: rvol > > Type: Disperse > > Volume ID: e8f15248-d9de-458e-9896-f1a5782dcf74 > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x (3 + 1) = 4 > > Transport-type: tcp > > Bricks: > > Brick1: calliope:/brick/p1 > > Brick2: euterpe:/brick/p1 > > Brick3: lemans:/brick/p1 > > Brick4: thalia:/brick/p1 > > Options Reconfigured: > > performance.readdir-ahead: on > > nfs.disable: off > > > > I inadvertently allowed one of the nodes (thalia) to be reinstalled; > which > > overwrote the system, but not the brick, and I need guidance in getting > it > > back into the volume. > > > > (on lemans) > > gluster peer status > > Number of Peers: 3 > > > > Hostname: calliope > > Uuid: 72373eb1-8047-405a-a094-891e559755da > > State: Peer in Cluster (Connected) > > > > Hostname: euterpe > > Uuid: 9fafa5c4-1541-4aa0-9ea2-923a756cadbb > > State: Peer in Cluster (Connected) > > > > Hostname: thalia > > Uuid: 843169fa-3937-42de-8fda-9819efc75fe8 > > State: Peer Rejected (Connected) > > > > the thalia peer is rejected. If I try to peer probe thalia I am told it > > already part of the pool. If from thalia, I try to peer probe one of the > > others, I am told that they are already part of another pool. > > > > I have tried removing the thalia brick with > > gluster vol remove-brick rvol thalia:/brick/p1 start > > but get the error > > volume remove-brick start: failed: Remove brick incorrect brick count of > 1 > > for disperse 4 > > > > I am not finding much guidance for this particular situation. I could > use a > > suggestion on how to recover. It's a lab situation so no biggie if I lose > > it. > > Cheers > > > > Tony Schreiner > > > > _______________________________________________ > > Gluster-users mailing list > > [email protected] > > http://www.gluster.org/mailman/listinfo/gluster-users >
_______________________________________________ Gluster-users mailing list [email protected] http://www.gluster.org/mailman/listinfo/gluster-users
