fast-diff map is not enabled for RBD images. Can it be a reason for Trimming not happening ?
Karun Josy On Sat, Jan 27, 2018 at 10:19 PM, Karun Josy <[email protected]> wrote: > Hi David, > > Thank you for your reply! I really appreciate it. > > The images are in pool id 55. It is an erasure coded pool. > > --------------- > $ echo $(( $(ceph pg 55.58 query | grep snap_trimq | cut -d[ -f2 | cut > -d] -f1 | tr ',' '\n' | wc -l) - 1 )) > 0 > $ echo $(( $(ceph pg 55.a query | grep snap_trimq | cut -d[ -f2 | cut -d] > -f1 | tr ',' '\n' | wc -l) - 1 )) > 0 > $ echo $(( $(ceph pg 55.65 query | grep snap_trimq | cut -d[ -f2 | cut > -d] -f1 | tr ',' '\n' | wc -l) - 1 )) > 0 > -------------- > > Current snap_trim_sleep value is default. > "osd_snap_trim_sleep": "0.000000". I assume it means there is no delay. > (Can't find any documentation related to it) > Will changing its value initiate snaptrimming, like > ceph tell osd.* injectargs '--osd_snap_trim_sleep 0.05' > > Also, we are using an rbd user with the below profile. It is used while > deleting snapshots > ------- > caps: [mon] profile rbd > caps: [osd] profile rbd pool=ecpool, profile rbd pool=vm, profile > rbd-read-only pool=templates > ------- > > Can it be a reason ? > > Also, can you let me know which all logs to check while deleting snapshots > to see if it is snaptrimming ? > I am sorry I feel like pestering you too much. > But in mailing lists, I can see you have dealt with similar issues with > Snapshots > So I think you can help me figure this mess out. > > > Karun Josy > > On Sat, Jan 27, 2018 at 7:15 PM, David Turner <[email protected]> > wrote: > >> Prove* a positive >> >> On Sat, Jan 27, 2018, 8:45 AM David Turner <[email protected]> wrote: >> >>> Unless you have things in your snap_trimq, your problem isn't snap >>> trimming. That is currently how you can check snap trimming and you say >>> you're caught up. >>> >>> Are you certain that you are querying the correct pool for the images >>> you are snapshotting. You showed that you tested 4 different pools. You >>> should only need to check the pool with the images you are dealing with. >>> >>> You can inversely price a positive by changing your snap_trim settings >>> to not do any cleanup and see if the appropriate PGs have anything in their >>> q. >>> >>> On Sat, Jan 27, 2018, 12:06 AM Karun Josy <[email protected]> wrote: >>> >>>> Is scrubbing and deep scrubbing necessary for Snaptrim operation to >>>> happen ? >>>> >>>> Karun Josy >>>> >>>> On Fri, Jan 26, 2018 at 9:29 PM, Karun Josy <[email protected]> >>>> wrote: >>>> >>>>> Thank you for your quick response! >>>>> >>>>> I used the command to fetch the snap_trimq from many pgs, however it >>>>> seems they don't have any in queue ? >>>>> >>>>> For eg : >>>>> ==================== >>>>> $ echo $(( $(ceph pg 55.4a query | grep snap_trimq | cut -d[ -f2 | >>>>> cut -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>>>> 0 >>>>> $ echo $(( $(ceph pg 55.5a query | grep snap_trimq | cut -d[ -f2 | >>>>> cut -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>>>> 0 >>>>> $ echo $(( $(ceph pg 55.88 query | grep snap_trimq | cut -d[ -f2 | >>>>> cut -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>>>> 0 >>>>> $ echo $(( $(ceph pg 55.55 query | grep snap_trimq | cut -d[ -f2 | >>>>> cut -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>>>> 0 >>>>> $ echo $(( $(ceph pg 54.a query | grep snap_trimq | cut -d[ -f2 | cut >>>>> -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>>>> 0 >>>>> $ echo $(( $(ceph pg 34.1d query | grep snap_trimq | cut -d[ -f2 | >>>>> cut -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>>>> 0 >>>>> $ echo $(( $(ceph pg 1.3f query | grep snap_trimq | cut -d[ -f2 | cut >>>>> -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>>>> 0 >>>>> ===================== >>>>> >>>>> >>>>> While going through the PG query, I find that these PGs have no value >>>>> in purged_snaps section too. >>>>> For eg : >>>>> ceph pg 55.80 query >>>>> -- >>>>> --- >>>>> --- >>>>> { >>>>> "peer": "83(3)", >>>>> "pgid": "55.80s3", >>>>> "last_update": "43360'15121927", >>>>> "last_complete": "43345'15073146", >>>>> "log_tail": "43335'15064480", >>>>> "last_user_version": 15066124, >>>>> "last_backfill": "MAX", >>>>> "last_backfill_bitwise": 1, >>>>> "purged_snaps": [], >>>>> "history": { >>>>> "epoch_created": 5950, >>>>> "epoch_pool_created": 5950, >>>>> "last_epoch_started": 43339, >>>>> "last_interval_started": 43338, >>>>> "last_epoch_clean": 43340, >>>>> "last_interval_clean": 43338, >>>>> "last_epoch_split": 0, >>>>> "last_epoch_marked_full": 42032, >>>>> "same_up_since": 43338, >>>>> "same_interval_since": 43338, >>>>> "same_primary_since": 43276, >>>>> "last_scrub": "35299'13072533", >>>>> "last_scrub_stamp": "2018-01-18 14:01:19.557972", >>>>> "last_deep_scrub": "31372'12176860", >>>>> "last_deep_scrub_stamp": "2018-01-15 12:21:17.025305", >>>>> "last_clean_scrub_stamp": "2018-01-18 14:01:19.557972" >>>>> }, >>>>> >>>>> Not sure if it is related. >>>>> >>>>> The cluster is not open to any new clients. However we see a steady >>>>> growth of space usage every day. >>>>> And worst case scenario, it might grow faster than we can add more >>>>> space, which will be dangerous. >>>>> >>>>> Any help is really appreciated. >>>>> >>>>> Karun Josy >>>>> >>>>> On Fri, Jan 26, 2018 at 8:23 PM, David Turner <[email protected]> >>>>> wrote: >>>>> >>>>>> "snap_trimq": "[]", >>>>>> >>>>>> That is exactly what you're looking for to see how many objects a PG >>>>>> still had that need to be cleaned up. I think something like this should >>>>>> give you the number of objects in the snap_trimq for a PG. >>>>>> >>>>>> echo $(( $(ceph pg $pg query | grep snap_trimq | cut -d[ -f2 | cut >>>>>> -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>>>>> >>>>>> Note, I'm not at a computer and topping this from my phone so it's >>>>>> not pretty and I know of a few ways to do that better, but that should >>>>>> work >>>>>> all the same. >>>>>> >>>>>> For your needs a visual inspection of several PGs should be >>>>>> sufficient to see if there is anything in the snap_trimq to begin with. >>>>>> >>>>>> On Fri, Jan 26, 2018, 9:18 AM Karun Josy <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi David, >>>>>>> >>>>>>> Thank you for the response. To be honest, I am afraid it is going to >>>>>>> be a issue in our cluster. >>>>>>> It seems snaptrim has not been going on for sometime now , maybe >>>>>>> because we were expanding the cluster adding nodes for the past few >>>>>>> weeks. >>>>>>> >>>>>>> I would be really glad if you can guide me how to overcome this. >>>>>>> Cluster has about 30TB data and 11 million objects. With about 100 >>>>>>> disks spread across 16 nodes. Version is 12.2.2 >>>>>>> Searching through the mailing lists I can see many cases where the >>>>>>> performance were affected while snaptrimming. >>>>>>> >>>>>>> Can you help me figure out these : >>>>>>> >>>>>>> - How to find snaptrim queue of a PG. >>>>>>> - Can snaptrim be started just on 1 PG >>>>>>> - How can I make sure cluster IO performance is not affected ? >>>>>>> I read about osd_snap_trim_sleep , how can it be changed ? >>>>>>> Is this the command : ceph tell osd.* injectargs >>>>>>> '--osd_snap_trim_sleep 0.005' >>>>>>> >>>>>>> If yes what is the recommended value that we can use ? >>>>>>> >>>>>>> Also, what all parameters should we be concerned about? I would >>>>>>> really appreciate any suggestions. >>>>>>> >>>>>>> >>>>>>> Below is a brief extract of a PG queried >>>>>>> ---------------------------- >>>>>>> ceph pg 55.77 query >>>>>>> { >>>>>>> "state": "active+clean", >>>>>>> "snap_trimq": "[]", >>>>>>> --- >>>>>>> ---- >>>>>>> >>>>>>> "pgid": "55.77s7", >>>>>>> "last_update": "43353'17222404", >>>>>>> "last_complete": "42773'16814984", >>>>>>> "log_tail": "42763'16812644", >>>>>>> "last_user_version": 16814144, >>>>>>> "last_backfill": "MAX", >>>>>>> "last_backfill_bitwise": 1, >>>>>>> "purged_snaps": [], >>>>>>> "history": { >>>>>>> "epoch_created": 5950, >>>>>>> --- >>>>>>> --- >>>>>>> --- >>>>>>> >>>>>>> >>>>>>> Karun Josy >>>>>>> >>>>>>> On Fri, Jan 26, 2018 at 6:36 PM, David Turner <[email protected] >>>>>>> > wrote: >>>>>>> >>>>>>>> You may find the information in this ML thread useful. >>>>>>>> https://www.spinics.net/lists/ceph-users/msg41279.html >>>>>>>> >>>>>>>> It talks about a couple ways to track your snaptrim queue. >>>>>>>> >>>>>>>> On Fri, Jan 26, 2018 at 2:09 AM Karun Josy <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> We have set no scrub , no deep scrub flag on a ceph cluster. >>>>>>>>> When we are deleting snapshots we are not seeing any change in >>>>>>>>> usage space. >>>>>>>>> >>>>>>>>> I understand that Ceph OSDs delete data asynchronously, so >>>>>>>>> deleting a snapshot doesn’t free up the disk space immediately. But >>>>>>>>> we are >>>>>>>>> not seeing any change for sometime. >>>>>>>>> >>>>>>>>> What can be possible reason ? Any suggestions would be really >>>>>>>>> helpful as the cluster size seems to be growing each day even though >>>>>>>>> snapshots are deleted. >>>>>>>>> >>>>>>>>> >>>>>>>>> Karun >>>>>>>>> _______________________________________________ >>>>>>>>> ceph-users mailing list >>>>>>>>> [email protected] >>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>> >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
