Just for other users.. they may find this usefull. I finally started Gluster server process on failed node that lost brick and all went OK. Server is again available as a peer and failed brick is not running, so I can continue with replace brick/ reset brick operation.
> On 16 Apr 2019, at 17:44, Martin Toth <[email protected]> wrote: > > Thanks for clarification, one more question. > > When I will recover(boot) failed node back and this peer will be available > again to remaining two nodes. How do I tell gluster to mark this brick as > failed ? > > I mean, I’ve booted failed node back without networking. Disk partition (ZFS > pool on another disks) where brick was before failure is lost. > Now I can start gluster event when I don't have ZFS pool where failed brick > was before ? > > This wont be a problem when I will connect this node back to cluster ? > (before brick replace/reset command will be issued) > > Thanks. BR! > Martin > >> On 11 Apr 2019, at 15:40, Karthik Subrahmanya <[email protected] >> <mailto:[email protected]>> wrote: >> >> >> >> On Thu, Apr 11, 2019 at 6:38 PM Martin Toth <[email protected] >> <mailto:[email protected]>> wrote: >> Hi Karthik, >> >>> On Thu, Apr 11, 2019 at 12:43 PM Martin Toth <[email protected] >>> <mailto:[email protected]>> wrote: >>> Hi Karthik, >>> >>> more over, I would like to ask if there are some recommended >>> settings/parameters for SHD in order to achieve good or fair I/O while >>> volume will be healed when I will replace Brick (this should trigger >>> healing process). >>> If I understand you concern correctly, you need to get fair I/O performance >>> for clients while healing takes place as part of the replace brick >>> operation. For this you can turn off the "data-self-heal" and >>> "metadata-self-heal" options until the heal completes on the new brick. >> >> This is exactly what I mean. I am running VM disks on remaining 2 (out of 3 >> - one failed as mentioned) nodes and I need to ensure there will be fair I/O >> performance available on these two nodes while replace brick operation will >> heal volume. >> I will not run any VMs on node where replace brick operation will be >> running. So if I understand correctly, when I will set : >> >> # gluster volume set <volname> cluster.data-self-heal off >> # gluster volume set <volname> cluster.metadata-self-heal off >> >> this will tell Gluster clients (libgfapi and FUSE mount) not to read from >> node “where replace brick operation” is in place but from remaing two >> healthy nodes. Is this correct ? Thanks for clarification. >> The reads will be served from one of the good bricks since the file will >> either be not present on the replaced brick at the time of read or it will >> be present but marked for heal if it is not already healed. If already >> healed by SHD, then it could be served from the new brick as well, but there >> won't be any problem in reading from there in that scenario. >> By setting these two options whenever a read comes from client it will not >> try to heal the file for data/metadata. Otherwise it would try to heal (if >> not already healed by SHD) when the read comes on this, hence slowing down >> the client. >> >>> Turning off client side healing doesn't compromise data integrity and >>> consistency. During the read request from client, pending xattr is >>> evaluated for replica copies and read is only served from correct copy. >>> During writes, IO will continue on both the replicas, SHD will take care of >>> healing files. >>> After replacing the brick, we strongly recommend you to consider upgrading >>> your gluster to one of the maintained versions. We have many stability >>> related fixes there, which can handle some critical issues and corner cases >>> which you could hit during these kind of scenarios. >> >> This will be first priority in infrastructure after fixing this cluster back >> to fully functional replica3. I will upgrade to 3.12.x and then to version 5 >> or 6. >> Sounds good. >> >> If you are planning to have the same name for the new brick and if you get >> the error like "Brick may be containing or be contained by an existing >> brick" even after using the force option, try using a different name. That >> should work. >> >> Regards, >> Karthik >> >> BR, >> Martin >> >>> Regards, >>> Karthik >>> I had some problems in past when healing was triggered, VM disks became >>> unresponsive because healing took most of I/O. My volume containing only >>> big files with VM disks. >>> >>> Thanks for suggestions. >>> BR, >>> Martin >>> >>>> On 10 Apr 2019, at 12:38, Martin Toth <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> Thanks, this looks ok to me, I will reset brick because I don't have any >>>> data anymore on failed node so I can use same path / brick name. >>>> >>>> Is reseting brick dangerous command? Should I be worried about some >>>> possible failure that will impact remaining two nodes? I am running really >>>> old 3.7.6 but stable version. >>>> >>>> Thanks, >>>> BR! >>>> >>>> Martin >>>> >>>> >>>>> On 10 Apr 2019, at 12:20, Karthik Subrahmanya <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> >>>>> Hi Martin, >>>>> >>>>> After you add the new disks and creating raid array, you can run the >>>>> following command to replace the old brick with new one: >>>>> >>>>> - If you are going to use a different name to the new brick you can run >>>>> gluster volume replace-brick <volname> <old-brick> <new-brick> commit >>>>> force >>>>> >>>>> - If you are planning to use the same name for the new brick as well then >>>>> you can use >>>>> gluster volume reset-brick <volname> <old-brick> <new-brick> commit force >>>>> Here old-brick & new-brick's hostname & path should be same. >>>>> >>>>> After replacing the brick, make sure the brick comes online using volume >>>>> status. >>>>> Heal should automatically start, you can check the heal status to see all >>>>> the files gets replicated to the newly added brick. If it does not start >>>>> automatically, you can manually start that by running gluster volume heal >>>>> <volname>. >>>>> >>>>> HTH, >>>>> Karthik >>>>> >>>>> On Wed, Apr 10, 2019 at 3:13 PM Martin Toth <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> Hi all, >>>>> >>>>> I am running replica 3 gluster with 3 bricks. One of my servers failed - >>>>> all disks are showing errors and raid is in fault state. >>>>> >>>>> Type: Replicate >>>>> Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a >>>>> Status: Started >>>>> Number of Bricks: 1 x 3 = 3 >>>>> Transport-type: tcp >>>>> Bricks: >>>>> Brick1: node1.san:/tank/gluster/gv0imagestore/brick1 >>>>> Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 <— this brick is down >>>>> Brick3: node3.san:/tank/gluster/gv0imagestore/brick1 >>>>> >>>>> So one of my bricks is totally failed (node2). It went down and all data >>>>> are lost (failed raid on node2). Now I am running only two bricks on 2 >>>>> servers out from 3. >>>>> This is really critical problem for us, we can lost all data. I want to >>>>> add new disks to node2, create new raid array on them and try to replace >>>>> failed brick on this node. >>>>> >>>>> What is the procedure of replacing Brick2 on node2, can someone advice? I >>>>> can’t find anything relevant in documentation. >>>>> >>>>> Thanks in advance, >>>>> Martin >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> [email protected] <mailto:[email protected]> >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>> <https://lists.gluster.org/mailman/listinfo/gluster-users>
_______________________________________________ Gluster-users mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-users
