Hi ,

This sounds nice. I would like to ask if the order is starting from the local 
node's bricks first ? (I am talking about --brick=one)

Best Regards,
Strahil NikolovOn Mar 5, 2019 10:51, Ashish Pandey <[email protected]> wrote:
>
> Hi All,
>
> We have observed and heard from gluster users about the long time "heal info" 
> command takes.
> Even when we all want to know if a gluster volume is healthy or not, it takes 
> time to list down all the files from all the bricks after which we can be 
> sure if the volume is healthy or not.
> Here, we have come up with some options for "heal info" command which provide 
> report quickly and reliably.
>
> gluster v heal vol info --subvol=[number of the subvol] --brick=[one,all]
> --------
>
> Problem: "gluster v heal <volname> info" command picks each subvolume and 
> checks the .glusterfs/indices/xattrop folder of  every brick of that 
> subvolume to find out if there is any entry
> which needs to be healed. It picks the entry and takes a lock  on that entry 
> to check xattrs to find out if that entry actually needs heal or not.
> This LOCK->CHECK-XATTR->UNLOCK cycle takes lot of time for each file.
>
> Let's consider two most often seen cases for which we use "heal info" and try 
> to understand the improvements.
>
> Case -1 : Consider 4+2 EC volume and all the bricks on 6 different nodes.
> A brick of the volume is down and client has written 10000 files on one of 
> the mount point of this volume. Entries for these 10K files will be created 
> on ".glusterfs/indices/xattrop"
> on all the rest of 5 bricks. Now, brick is UP and when we use "heal info" 
> command for this volume, it goes to all the bricks and picks these 10K file 
> entries and
> goes through LOCK->CHECK-XATTR->UNLOCK cycle for all the files. This happens 
> for all the bricks, that means, we check 50K files and perform the 
> LOCK->CHECK-XATTR->UNLOCK cycle 50K times,
> while only 10K entries were sufficient to check. It is a very time consuming 
> operation. If IO"s are happening one some of the new files, we check these 
> files also which will add the time.
> Here, all we wanted to know if our volume has been healed and healthy.
>
> Solution : Whenever a brick goes down and comes up and when we use "heal 
> info" command, our *main intention* is to find out if the volume is *healthy* 
> or *unhealthy*. A volume is unhealthy even if one
> file is not healthy. So, we should scan bricks one by one and as soon as we 
> find that one brick is having some entries which require to be healed, we can 
> come out and list the files and say the volume is not
> healthy. No need to scan rest of the bricks. That's where "--brick=[one,all]" 
> option has been introduced.
>
> "gluster v heal vol info --brick=[one,all]"
> "one" - It will scan the brick sequentially and as soon as it will find any 
> unhealthy entries, it will list it out and stop scanning other bricks.
> "all" - It will act just like current behavior and provide all the files from 
> all the bricks. If we do not provide this option, default (current) behavior 
> will be applicable.
>
> Case -2 : Consider 24 X (4+2) EC volume. Let's say one brick from *only one* 
> of the sub volume has been replaced and a heal has been triggered.
> To know if the volume is in healthy state, we go to each brick of *each and 
> every sub volume* and check if there are any entries in 
> ".glusterfs/indices/xattrop" folder which need heal or not.
> If we know which sub volume participated in brick replacement, we just need 
> to check health of that sub volume and not query/check other sub volumes.
>
> If several clients are writing number of files on this volume, an entry for 
> each of these files will be created in  .glusterfs/indices/xattrop and "heal 
> info'
> command will go through LOCK->CHECK-XATTR->UNLOCK cycle to find out if these 
> entries need heal or not which takes lot of time.
> In addition to this a client will also see performance drop as it will have 
> to release and take lock again.
>
> Solution: Provide an option to mention number of sub volume for which we want 
> to check heal info.
>
> "gluster v heal vol info --subvol=<no of subvolume>  "
> Here, --subvol will be given number of the subvolume we want to check.
> Example:
> "gluster v heal vol info --subvol=1 "
>
>
> ===================================
> Performance Data - 
> A quick performance test done on standalone system.
>
> Type: Distributed-Disperse
> Volume ID: ea40eb13-d42c-431c-9c89-0153e834e67e
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 2 x (4 + 2) = 12
> Transport-type: tcp
> Bricks:
> Brick1: apandey:/home/apandey/bricks/gluster/vol-1
> Brick2: apandey:/home/apandey/bricks/gluster/vol-2
> Brick3: apandey:/home/apandey/bricks/gluster/vol-3
> Brick4: apandey:/home/apandey/bricks/gluster/vol-4
> Brick5: apandey:/home/apandey/bricks/gluster/vol-5
> Brick6: apandey:/home/apandey/bricks/gluster/vol-6
> Brick7: apandey:/home/apandey/bricks/gluster/new-1
> Brick8: apandey:/home/apandey/bricks/gluster/new-2
> Brick9: apandey:/home/apandey/bricks/gluster/new-3
> Brick10: apandey:/home/apandey/bricks/gluster/new-4
> Brick11: apandey:/home/apandey/bricks/gluster/new-5
> Brick12: apandey:/home/apandey/bricks/gluster/new-6
>
> Just disabled the shd to get the data -
>
> Killed one brick each from two subvolumes and wrote 2000 files on mount point.
> [root@apandey vol]# for i in {1..2000};do echo abc >> file-$i; done
>
> Start the volume using force option and get the heal info. Following is the 
> data -
>
> [root@apandey glusterfs]# time gluster v heal vol info --brick=one >> 
> /dev/null             <<<<<<<< This will scan brick one by one and come out 
> as soon as we find volume is unhealthy.
>
> real    0m8.316s
> user    0m2.241s
> sys    0m1.278s
> [root@apandey glusterfs]# 
>
> [root@apandey glusterfs]# time gluster v heal vol info >> /dev/null           
>                        <<<<<<<< This is current behavior.
>
> real    0m26.097s
> user    0m10.868s
> sys    0m6.198s
> [root@apandey glusterfs]# 
> ===================================
>
> I would like your comments/suggestions on this improvements.
> Specially, would like to hear on the new syntax of the command -
>
> gluster v heal vol info --subvol=[number of the subvol] --brick=[one,all]
>
> Note that if we do not provide new options, command will behave just like it 
> does right now.
> Also, this improvement is valid for AFR and EC.
>
> ---
> Ashish
>
>
>
>
>
>
>
>
_______________________________________________
Gluster-devel mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-devel

Reply via email to