Use replace brick commit force. @Pranith/@Anuradha - post this will self heal be triggered automatically or a manual trigger is needed?
On Thursday 4 August 2016, Andres E. Moya <[email protected]> wrote: > Does anyone else have input? > > we are currently only running off 1 node and one node is offline in > replicate brick. > > we are not experiencing any downtime because the 1 node is up. > > I do not understand which is the best way to bring up a second node. > > Do we just re create a file system on the node that is down and the mount > points and allow gluster to heal( my concern with this is whether the node > that is down will some how take precedence and wipe out the data on the > healthy node instead of vice versa) > > Or do we fully wipe out the config on the node that is down, re create the > file system and re add the node that is down into gluster using the add > brick command replica 3, and then wait for it to heal then run the remove > brick command for the failed brick > > which would be the safest and easiest to accomplish > > thanks for any input > > > > ------------------------------ > *From: *"Leno Vo" <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> > *To: *"Andres E. Moya" <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> > *Cc: *"gluster-users" <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> > *Sent: *Tuesday, August 2, 2016 6:45:27 PM > *Subject: *Re: [Gluster-users] Failed file system > > if you don't want any downtime (in the case that your node 2 really die), > you have to create a new gluster san (if you have the resources of course, > 3 nodes as much as possible this time), and then just migrate your vms (or > files), therefore no downtime but you have to cross your finger that the > only node will not die too... also without sharding the vm migration > especially an rdp one, will be slow access from users till it migrated. > > you have to start testing sharding, it's fast and cool... > > > > > On Tuesday, August 2, 2016 2:51 PM, Andres E. Moya < > [email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > > > couldnt we just add a new server by > > gluster peer probe > gluster volume add-brick replica 3 (will this command succeed with 1 > current failed brick?) > > let it heal, then > > gluster volume remove remove-brick > ------------------------------ > *From: *"Leno Vo" <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> > *To: *"Andres E. Moya" <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>>, > "gluster-users" <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> > *Sent: *Tuesday, August 2, 2016 1:26:42 PM > *Subject: *Re: [Gluster-users] Failed file system > > you need to have a downtime to recreate the second node, two nodes is > actually not good for production and you should have put raid 1 or raid 5 > as your gluster storage, when you recreate the second node you might try > running some VMs that need to be up and rest of vm need to be down but stop > all backup and if you have replication, stop it too. if you have 1G nic, > 2cpu and less 8Gram, then i suggest all turn off the VMs during recreation > of second node. someone said if you have sharding with 3.7.x, maybe some > vip vm can be up... > > if it just a filesystem, then just turn off the backup service until you > recreate the second node. depending on your resources and how big is your > storage, it might be hours to recreate it and even days... > > here's my process on recreating the second or third node (copied and > modifed from the net), > > #make sure partition is already added!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > This procedure is for replacing a failed server, IF your newly installed > server has the same hostname as the failed one: > > (If your new server will have a different hostname, see this article > instead.) > > For purposes of this example, the server that crashed will be server3 and > the other servers will be server1 and server2 > > On both server1 and server2, make sure hostname server3 resolves to the > correct IP address of the new replacement server. > #On either server1 or server2, do > grep server3 /var/lib/glusterd/peers/* > > This will return a uuid followed by ":hostname1=server3" > > #On server3, make sure glusterd is stopped, then do > echo UUID={uuid from previous step}>/var/lib/glusterd/glusterd.info > > #actual testing below, > [root@node1 ~]# cat /var/lib/glusterd/glusterd.info > UUID=4b9d153c-5958-4dbe-8f91-7b5002882aac > operating-version=30710 > #the second line is new......... maybe not needed... > > On server3: > make sure that all brick directories are created/mounted > start glusterd > peer probe one of the existing servers > > #restart glusterd, check that full peer list has been populated using > gluster peer status > > (if peers are missing, probe them explicitly, then restart glusterd again) > #check that full volume configuration has been populated using > gluster volume info > > if volume configuration is missing, do > #on the other node > gluster volume sync "replace-node" all > > #on the node to be replaced > setfattr -n trusted.glusterfs.volume-id -v 0x$(grep volume-id > /var/lib/glusterd/vols/v1/info | cut -d= -f2 | sed 's/-//g') /gfs/b1/v1 > setfattr -n trusted.glusterfs.volume-id -v 0x$(grep volume-id > /var/lib/glusterd/vols/v2/info | cut -d= -f2 | sed 's/-//g') /gfs/b2/v2 > setfattr -n trusted.glusterfs.volume-id -v 0x$(grep volume-id > /var/lib/glusterd/vols/config/info | cut -d= -f2 | sed 's/-//g') > /gfs/b1/config/c1 > > mount -t glusterfs localhost:config /data/data1 > > #install ctdb if not yet installed and put it back online, use the step on > creating the ctdb config but > #use your common sense not to deleted or modify current one. > > gluster vol heal v1 full > gluster vol heal v2 full > gluster vol heal config full > > > > On Tuesday, August 2, 2016 11:57 AM, Andres E. Moya < > [email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > > > Hi, we have a 2 node replica setup > on 1 of the nodes the file system that had the brick on it failed, not the > OS > can we re create a file system and mount the bricks on the same mount point > > what will happen, will the data from the other node sync over, or will the > failed node wipe out the data on the other mode? > > what would be the correct process? > > Thanks in advance for any help > _______________________________________________ > Gluster-users mailing list > [email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');> > http://www.gluster.org/mailman/listinfo/gluster-users > > > > > > > -- --Atin
_______________________________________________ Gluster-users mailing list [email protected] http://www.gluster.org/mailman/listinfo/gluster-users
