Thanks for clarification, one more question. When I will recover(boot) failed node back and this peer will be available again to remaining two nodes. How do I tell gluster to mark this brick as failed ?
I mean, I’ve booted failed node back without networking. Disk partition (ZFS pool on another disks) where brick was before failure is lost. Now I can start gluster event when I don't have ZFS pool where failed brick was before ? This wont be a problem when I will connect this node back to cluster ? (before brick replace/reset command will be issued) Thanks. BR! Martin > On 11 Apr 2019, at 15:40, Karthik Subrahmanya <[email protected]> wrote: > > > > On Thu, Apr 11, 2019 at 6:38 PM Martin Toth <[email protected] > <mailto:[email protected]>> wrote: > Hi Karthik, > >> On Thu, Apr 11, 2019 at 12:43 PM Martin Toth <[email protected] >> <mailto:[email protected]>> wrote: >> Hi Karthik, >> >> more over, I would like to ask if there are some recommended >> settings/parameters for SHD in order to achieve good or fair I/O while >> volume will be healed when I will replace Brick (this should trigger healing >> process). >> If I understand you concern correctly, you need to get fair I/O performance >> for clients while healing takes place as part of the replace brick >> operation. For this you can turn off the "data-self-heal" and >> "metadata-self-heal" options until the heal completes on the new brick. > > This is exactly what I mean. I am running VM disks on remaining 2 (out of 3 - > one failed as mentioned) nodes and I need to ensure there will be fair I/O > performance available on these two nodes while replace brick operation will > heal volume. > I will not run any VMs on node where replace brick operation will be running. > So if I understand correctly, when I will set : > > # gluster volume set <volname> cluster.data-self-heal off > # gluster volume set <volname> cluster.metadata-self-heal off > > this will tell Gluster clients (libgfapi and FUSE mount) not to read from > node “where replace brick operation” is in place but from remaing two healthy > nodes. Is this correct ? Thanks for clarification. > The reads will be served from one of the good bricks since the file will > either be not present on the replaced brick at the time of read or it will be > present but marked for heal if it is not already healed. If already healed by > SHD, then it could be served from the new brick as well, but there won't be > any problem in reading from there in that scenario. > By setting these two options whenever a read comes from client it will not > try to heal the file for data/metadata. Otherwise it would try to heal (if > not already healed by SHD) when the read comes on this, hence slowing down > the client. > >> Turning off client side healing doesn't compromise data integrity and >> consistency. During the read request from client, pending xattr is evaluated >> for replica copies and read is only served from correct copy. During writes, >> IO will continue on both the replicas, SHD will take care of healing files. >> After replacing the brick, we strongly recommend you to consider upgrading >> your gluster to one of the maintained versions. We have many stability >> related fixes there, which can handle some critical issues and corner cases >> which you could hit during these kind of scenarios. > > This will be first priority in infrastructure after fixing this cluster back > to fully functional replica3. I will upgrade to 3.12.x and then to version 5 > or 6. > Sounds good. > > If you are planning to have the same name for the new brick and if you get > the error like "Brick may be containing or be contained by an existing brick" > even after using the force option, try using a different name. That should > work. > > Regards, > Karthik > > BR, > Martin > >> Regards, >> Karthik >> I had some problems in past when healing was triggered, VM disks became >> unresponsive because healing took most of I/O. My volume containing only big >> files with VM disks. >> >> Thanks for suggestions. >> BR, >> Martin >> >>> On 10 Apr 2019, at 12:38, Martin Toth <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Thanks, this looks ok to me, I will reset brick because I don't have any >>> data anymore on failed node so I can use same path / brick name. >>> >>> Is reseting brick dangerous command? Should I be worried about some >>> possible failure that will impact remaining two nodes? I am running really >>> old 3.7.6 but stable version. >>> >>> Thanks, >>> BR! >>> >>> Martin >>> >>> >>>> On 10 Apr 2019, at 12:20, Karthik Subrahmanya <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> Hi Martin, >>>> >>>> After you add the new disks and creating raid array, you can run the >>>> following command to replace the old brick with new one: >>>> >>>> - If you are going to use a different name to the new brick you can run >>>> gluster volume replace-brick <volname> <old-brick> <new-brick> commit force >>>> >>>> - If you are planning to use the same name for the new brick as well then >>>> you can use >>>> gluster volume reset-brick <volname> <old-brick> <new-brick> commit force >>>> Here old-brick & new-brick's hostname & path should be same. >>>> >>>> After replacing the brick, make sure the brick comes online using volume >>>> status. >>>> Heal should automatically start, you can check the heal status to see all >>>> the files gets replicated to the newly added brick. If it does not start >>>> automatically, you can manually start that by running gluster volume heal >>>> <volname>. >>>> >>>> HTH, >>>> Karthik >>>> >>>> On Wed, Apr 10, 2019 at 3:13 PM Martin Toth <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> Hi all, >>>> >>>> I am running replica 3 gluster with 3 bricks. One of my servers failed - >>>> all disks are showing errors and raid is in fault state. >>>> >>>> Type: Replicate >>>> Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a >>>> Status: Started >>>> Number of Bricks: 1 x 3 = 3 >>>> Transport-type: tcp >>>> Bricks: >>>> Brick1: node1.san:/tank/gluster/gv0imagestore/brick1 >>>> Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 <— this brick is down >>>> Brick3: node3.san:/tank/gluster/gv0imagestore/brick1 >>>> >>>> So one of my bricks is totally failed (node2). It went down and all data >>>> are lost (failed raid on node2). Now I am running only two bricks on 2 >>>> servers out from 3. >>>> This is really critical problem for us, we can lost all data. I want to >>>> add new disks to node2, create new raid array on them and try to replace >>>> failed brick on this node. >>>> >>>> What is the procedure of replacing Brick2 on node2, can someone advice? I >>>> can’t find anything relevant in documentation. >>>> >>>> Thanks in advance, >>>> Martin >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> [email protected] <mailto:[email protected]> >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> <https://lists.gluster.org/mailman/listinfo/gluster-users>
_______________________________________________ Gluster-users mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-users
