Hi , This sounds nice. I would like to ask if the order is starting from the local node's bricks first ? (I am talking about --brick=one)
Best Regards, Strahil NikolovOn Mar 5, 2019 10:51, Ashish Pandey <[email protected]> wrote: > > Hi All, > > We have observed and heard from gluster users about the long time "heal info" > command takes. > Even when we all want to know if a gluster volume is healthy or not, it takes > time to list down all the files from all the bricks after which we can be > sure if the volume is healthy or not. > Here, we have come up with some options for "heal info" command which provide > report quickly and reliably. > > gluster v heal vol info --subvol=[number of the subvol] --brick=[one,all] > -------- > > Problem: "gluster v heal <volname> info" command picks each subvolume and > checks the .glusterfs/indices/xattrop folder of every brick of that > subvolume to find out if there is any entry > which needs to be healed. It picks the entry and takes a lock on that entry > to check xattrs to find out if that entry actually needs heal or not. > This LOCK->CHECK-XATTR->UNLOCK cycle takes lot of time for each file. > > Let's consider two most often seen cases for which we use "heal info" and try > to understand the improvements. > > Case -1 : Consider 4+2 EC volume and all the bricks on 6 different nodes. > A brick of the volume is down and client has written 10000 files on one of > the mount point of this volume. Entries for these 10K files will be created > on ".glusterfs/indices/xattrop" > on all the rest of 5 bricks. Now, brick is UP and when we use "heal info" > command for this volume, it goes to all the bricks and picks these 10K file > entries and > goes through LOCK->CHECK-XATTR->UNLOCK cycle for all the files. This happens > for all the bricks, that means, we check 50K files and perform the > LOCK->CHECK-XATTR->UNLOCK cycle 50K times, > while only 10K entries were sufficient to check. It is a very time consuming > operation. If IO"s are happening one some of the new files, we check these > files also which will add the time. > Here, all we wanted to know if our volume has been healed and healthy. > > Solution : Whenever a brick goes down and comes up and when we use "heal > info" command, our *main intention* is to find out if the volume is *healthy* > or *unhealthy*. A volume is unhealthy even if one > file is not healthy. So, we should scan bricks one by one and as soon as we > find that one brick is having some entries which require to be healed, we can > come out and list the files and say the volume is not > healthy. No need to scan rest of the bricks. That's where "--brick=[one,all]" > option has been introduced. > > "gluster v heal vol info --brick=[one,all]" > "one" - It will scan the brick sequentially and as soon as it will find any > unhealthy entries, it will list it out and stop scanning other bricks. > "all" - It will act just like current behavior and provide all the files from > all the bricks. If we do not provide this option, default (current) behavior > will be applicable. > > Case -2 : Consider 24 X (4+2) EC volume. Let's say one brick from *only one* > of the sub volume has been replaced and a heal has been triggered. > To know if the volume is in healthy state, we go to each brick of *each and > every sub volume* and check if there are any entries in > ".glusterfs/indices/xattrop" folder which need heal or not. > If we know which sub volume participated in brick replacement, we just need > to check health of that sub volume and not query/check other sub volumes. > > If several clients are writing number of files on this volume, an entry for > each of these files will be created in .glusterfs/indices/xattrop and "heal > info' > command will go through LOCK->CHECK-XATTR->UNLOCK cycle to find out if these > entries need heal or not which takes lot of time. > In addition to this a client will also see performance drop as it will have > to release and take lock again. > > Solution: Provide an option to mention number of sub volume for which we want > to check heal info. > > "gluster v heal vol info --subvol=<no of subvolume> " > Here, --subvol will be given number of the subvolume we want to check. > Example: > "gluster v heal vol info --subvol=1 " > > > =================================== > Performance Data - > A quick performance test done on standalone system. > > Type: Distributed-Disperse > Volume ID: ea40eb13-d42c-431c-9c89-0153e834e67e > Status: Started > Snapshot Count: 0 > Number of Bricks: 2 x (4 + 2) = 12 > Transport-type: tcp > Bricks: > Brick1: apandey:/home/apandey/bricks/gluster/vol-1 > Brick2: apandey:/home/apandey/bricks/gluster/vol-2 > Brick3: apandey:/home/apandey/bricks/gluster/vol-3 > Brick4: apandey:/home/apandey/bricks/gluster/vol-4 > Brick5: apandey:/home/apandey/bricks/gluster/vol-5 > Brick6: apandey:/home/apandey/bricks/gluster/vol-6 > Brick7: apandey:/home/apandey/bricks/gluster/new-1 > Brick8: apandey:/home/apandey/bricks/gluster/new-2 > Brick9: apandey:/home/apandey/bricks/gluster/new-3 > Brick10: apandey:/home/apandey/bricks/gluster/new-4 > Brick11: apandey:/home/apandey/bricks/gluster/new-5 > Brick12: apandey:/home/apandey/bricks/gluster/new-6 > > Just disabled the shd to get the data - > > Killed one brick each from two subvolumes and wrote 2000 files on mount point. > [root@apandey vol]# for i in {1..2000};do echo abc >> file-$i; done > > Start the volume using force option and get the heal info. Following is the > data - > > [root@apandey glusterfs]# time gluster v heal vol info --brick=one >> > /dev/null <<<<<<<< This will scan brick one by one and come out > as soon as we find volume is unhealthy. > > real 0m8.316s > user 0m2.241s > sys 0m1.278s > [root@apandey glusterfs]# > > [root@apandey glusterfs]# time gluster v heal vol info >> /dev/null > <<<<<<<< This is current behavior. > > real 0m26.097s > user 0m10.868s > sys 0m6.198s > [root@apandey glusterfs]# > =================================== > > I would like your comments/suggestions on this improvements. > Specially, would like to hear on the new syntax of the command - > > gluster v heal vol info --subvol=[number of the subvol] --brick=[one,all] > > Note that if we do not provide new options, command will behave just like it > does right now. > Also, this improvement is valid for AFR and EC. > > --- > Ashish > > > > > > > >
_______________________________________________ Gluster-devel mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-devel
