I don't know why you keep asking the same question about snap trimming. You haven't shown any evidence that your cluster is behind on that. Have you looked into fstrim inside of your VMs?
On Mon, Jan 29, 2018, 4:30 AM Karun Josy <[email protected]> wrote: > fast-diff map is not enabled for RBD images. > Can it be a reason for Trimming not happening ? > > Karun Josy > > On Sat, Jan 27, 2018 at 10:19 PM, Karun Josy <[email protected]> wrote: > >> Hi David, >> >> Thank you for your reply! I really appreciate it. >> >> The images are in pool id 55. It is an erasure coded pool. >> >> --------------- >> $ echo $(( $(ceph pg 55.58 query | grep snap_trimq | cut -d[ -f2 | cut >> -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >> 0 >> $ echo $(( $(ceph pg 55.a query | grep snap_trimq | cut -d[ -f2 | cut >> -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >> 0 >> $ echo $(( $(ceph pg 55.65 query | grep snap_trimq | cut -d[ -f2 | cut >> -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >> 0 >> -------------- >> >> Current snap_trim_sleep value is default. >> "osd_snap_trim_sleep": "0.000000". I assume it means there is no delay. >> (Can't find any documentation related to it) >> Will changing its value initiate snaptrimming, like >> ceph tell osd.* injectargs '--osd_snap_trim_sleep 0.05' >> >> Also, we are using an rbd user with the below profile. It is used while >> deleting snapshots >> ------- >> caps: [mon] profile rbd >> caps: [osd] profile rbd pool=ecpool, profile rbd pool=vm, profile >> rbd-read-only pool=templates >> ------- >> >> Can it be a reason ? >> >> Also, can you let me know which all logs to check while deleting >> snapshots to see if it is snaptrimming ? >> I am sorry I feel like pestering you too much. >> But in mailing lists, I can see you have dealt with similar issues with >> Snapshots >> So I think you can help me figure this mess out. >> >> >> Karun Josy >> >> On Sat, Jan 27, 2018 at 7:15 PM, David Turner <[email protected]> >> wrote: >> >>> Prove* a positive >>> >>> On Sat, Jan 27, 2018, 8:45 AM David Turner <[email protected]> >>> wrote: >>> >>>> Unless you have things in your snap_trimq, your problem isn't snap >>>> trimming. That is currently how you can check snap trimming and you say >>>> you're caught up. >>>> >>>> Are you certain that you are querying the correct pool for the images >>>> you are snapshotting. You showed that you tested 4 different pools. You >>>> should only need to check the pool with the images you are dealing with. >>>> >>>> You can inversely price a positive by changing your snap_trim settings >>>> to not do any cleanup and see if the appropriate PGs have anything in their >>>> q. >>>> >>>> On Sat, Jan 27, 2018, 12:06 AM Karun Josy <[email protected]> wrote: >>>> >>>>> Is scrubbing and deep scrubbing necessary for Snaptrim operation to >>>>> happen ? >>>>> >>>>> Karun Josy >>>>> >>>>> On Fri, Jan 26, 2018 at 9:29 PM, Karun Josy <[email protected]> >>>>> wrote: >>>>> >>>>>> Thank you for your quick response! >>>>>> >>>>>> I used the command to fetch the snap_trimq from many pgs, however it >>>>>> seems they don't have any in queue ? >>>>>> >>>>>> For eg : >>>>>> ==================== >>>>>> $ echo $(( $(ceph pg 55.4a query | grep snap_trimq | cut -d[ -f2 | >>>>>> cut -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>>>>> 0 >>>>>> $ echo $(( $(ceph pg 55.5a query | grep snap_trimq | cut -d[ -f2 | >>>>>> cut -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>>>>> 0 >>>>>> $ echo $(( $(ceph pg 55.88 query | grep snap_trimq | cut -d[ -f2 | >>>>>> cut -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>>>>> 0 >>>>>> $ echo $(( $(ceph pg 55.55 query | grep snap_trimq | cut -d[ -f2 | >>>>>> cut -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>>>>> 0 >>>>>> $ echo $(( $(ceph pg 54.a query | grep snap_trimq | cut -d[ -f2 | >>>>>> cut -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>>>>> 0 >>>>>> $ echo $(( $(ceph pg 34.1d query | grep snap_trimq | cut -d[ -f2 | >>>>>> cut -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>>>>> 0 >>>>>> $ echo $(( $(ceph pg 1.3f query | grep snap_trimq | cut -d[ -f2 | >>>>>> cut -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>>>>> 0 >>>>>> ===================== >>>>>> >>>>>> >>>>>> While going through the PG query, I find that these PGs have no value >>>>>> in purged_snaps section too. >>>>>> For eg : >>>>>> ceph pg 55.80 query >>>>>> -- >>>>>> --- >>>>>> --- >>>>>> { >>>>>> "peer": "83(3)", >>>>>> "pgid": "55.80s3", >>>>>> "last_update": "43360'15121927", >>>>>> "last_complete": "43345'15073146", >>>>>> "log_tail": "43335'15064480", >>>>>> "last_user_version": 15066124, >>>>>> "last_backfill": "MAX", >>>>>> "last_backfill_bitwise": 1, >>>>>> "purged_snaps": [], >>>>>> "history": { >>>>>> "epoch_created": 5950, >>>>>> "epoch_pool_created": 5950, >>>>>> "last_epoch_started": 43339, >>>>>> "last_interval_started": 43338, >>>>>> "last_epoch_clean": 43340, >>>>>> "last_interval_clean": 43338, >>>>>> "last_epoch_split": 0, >>>>>> "last_epoch_marked_full": 42032, >>>>>> "same_up_since": 43338, >>>>>> "same_interval_since": 43338, >>>>>> "same_primary_since": 43276, >>>>>> "last_scrub": "35299'13072533", >>>>>> "last_scrub_stamp": "2018-01-18 14:01:19.557972", >>>>>> "last_deep_scrub": "31372'12176860", >>>>>> "last_deep_scrub_stamp": "2018-01-15 12:21:17.025305", >>>>>> "last_clean_scrub_stamp": "2018-01-18 14:01:19.557972" >>>>>> }, >>>>>> >>>>>> Not sure if it is related. >>>>>> >>>>>> The cluster is not open to any new clients. However we see a steady >>>>>> growth of space usage every day. >>>>>> And worst case scenario, it might grow faster than we can add more >>>>>> space, which will be dangerous. >>>>>> >>>>>> Any help is really appreciated. >>>>>> >>>>>> Karun Josy >>>>>> >>>>>> On Fri, Jan 26, 2018 at 8:23 PM, David Turner <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> "snap_trimq": "[]", >>>>>>> >>>>>>> That is exactly what you're looking for to see how many objects a PG >>>>>>> still had that need to be cleaned up. I think something like this should >>>>>>> give you the number of objects in the snap_trimq for a PG. >>>>>>> >>>>>>> echo $(( $(ceph pg $pg query | grep snap_trimq | cut -d[ -f2 | cut >>>>>>> -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>>>>>> >>>>>>> Note, I'm not at a computer and topping this from my phone so it's >>>>>>> not pretty and I know of a few ways to do that better, but that should >>>>>>> work >>>>>>> all the same. >>>>>>> >>>>>>> For your needs a visual inspection of several PGs should be >>>>>>> sufficient to see if there is anything in the snap_trimq to begin with. >>>>>>> >>>>>>> On Fri, Jan 26, 2018, 9:18 AM Karun Josy <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi David, >>>>>>>> >>>>>>>> Thank you for the response. To be honest, I am afraid it is going >>>>>>>> to be a issue in our cluster. >>>>>>>> It seems snaptrim has not been going on for sometime now , maybe >>>>>>>> because we were expanding the cluster adding nodes for the past few >>>>>>>> weeks. >>>>>>>> >>>>>>>> I would be really glad if you can guide me how to overcome this. >>>>>>>> Cluster has about 30TB data and 11 million objects. With about 100 >>>>>>>> disks spread across 16 nodes. Version is 12.2.2 >>>>>>>> Searching through the mailing lists I can see many cases where the >>>>>>>> performance were affected while snaptrimming. >>>>>>>> >>>>>>>> Can you help me figure out these : >>>>>>>> >>>>>>>> - How to find snaptrim queue of a PG. >>>>>>>> - Can snaptrim be started just on 1 PG >>>>>>>> - How can I make sure cluster IO performance is not affected ? >>>>>>>> I read about osd_snap_trim_sleep , how can it be changed ? >>>>>>>> Is this the command : ceph tell osd.* injectargs >>>>>>>> '--osd_snap_trim_sleep 0.005' >>>>>>>> >>>>>>>> If yes what is the recommended value that we can use ? >>>>>>>> >>>>>>>> Also, what all parameters should we be concerned about? I would >>>>>>>> really appreciate any suggestions. >>>>>>>> >>>>>>>> >>>>>>>> Below is a brief extract of a PG queried >>>>>>>> ---------------------------- >>>>>>>> ceph pg 55.77 query >>>>>>>> { >>>>>>>> "state": "active+clean", >>>>>>>> "snap_trimq": "[]", >>>>>>>> --- >>>>>>>> ---- >>>>>>>> >>>>>>>> "pgid": "55.77s7", >>>>>>>> "last_update": "43353'17222404", >>>>>>>> "last_complete": "42773'16814984", >>>>>>>> "log_tail": "42763'16812644", >>>>>>>> "last_user_version": 16814144, >>>>>>>> "last_backfill": "MAX", >>>>>>>> "last_backfill_bitwise": 1, >>>>>>>> "purged_snaps": [], >>>>>>>> "history": { >>>>>>>> "epoch_created": 5950, >>>>>>>> --- >>>>>>>> --- >>>>>>>> --- >>>>>>>> >>>>>>>> >>>>>>>> Karun Josy >>>>>>>> >>>>>>>> On Fri, Jan 26, 2018 at 6:36 PM, David Turner < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> You may find the information in this ML thread useful. >>>>>>>>> https://www.spinics.net/lists/ceph-users/msg41279.html >>>>>>>>> >>>>>>>>> It talks about a couple ways to track your snaptrim queue. >>>>>>>>> >>>>>>>>> On Fri, Jan 26, 2018 at 2:09 AM Karun Josy <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> We have set no scrub , no deep scrub flag on a ceph cluster. >>>>>>>>>> When we are deleting snapshots we are not seeing any change in >>>>>>>>>> usage space. >>>>>>>>>> >>>>>>>>>> I understand that Ceph OSDs delete data asynchronously, so >>>>>>>>>> deleting a snapshot doesn’t free up the disk space immediately. But >>>>>>>>>> we are >>>>>>>>>> not seeing any change for sometime. >>>>>>>>>> >>>>>>>>>> What can be possible reason ? Any suggestions would be really >>>>>>>>>> helpful as the cluster size seems to be growing each day even though >>>>>>>>>> snapshots are deleted. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Karun >>>>>>>>>> _______________________________________________ >>>>>>>>>> ceph-users mailing list >>>>>>>>>> [email protected] >>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>>> >> >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
