Prove* a positive On Sat, Jan 27, 2018, 8:45 AM David Turner <[email protected]> wrote:
> Unless you have things in your snap_trimq, your problem isn't snap > trimming. That is currently how you can check snap trimming and you say > you're caught up. > > Are you certain that you are querying the correct pool for the images you > are snapshotting. You showed that you tested 4 different pools. You should > only need to check the pool with the images you are dealing with. > > You can inversely price a positive by changing your snap_trim settings to > not do any cleanup and see if the appropriate PGs have anything in their q. > > On Sat, Jan 27, 2018, 12:06 AM Karun Josy <[email protected]> wrote: > >> Is scrubbing and deep scrubbing necessary for Snaptrim operation to >> happen ? >> >> Karun Josy >> >> On Fri, Jan 26, 2018 at 9:29 PM, Karun Josy <[email protected]> wrote: >> >>> Thank you for your quick response! >>> >>> I used the command to fetch the snap_trimq from many pgs, however it >>> seems they don't have any in queue ? >>> >>> For eg : >>> ==================== >>> $ echo $(( $(ceph pg 55.4a query | grep snap_trimq | cut -d[ -f2 | cut >>> -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>> 0 >>> $ echo $(( $(ceph pg 55.5a query | grep snap_trimq | cut -d[ -f2 | cut >>> -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>> 0 >>> $ echo $(( $(ceph pg 55.88 query | grep snap_trimq | cut -d[ -f2 | cut >>> -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>> 0 >>> $ echo $(( $(ceph pg 55.55 query | grep snap_trimq | cut -d[ -f2 | cut >>> -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>> 0 >>> $ echo $(( $(ceph pg 54.a query | grep snap_trimq | cut -d[ -f2 | cut >>> -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>> 0 >>> $ echo $(( $(ceph pg 34.1d query | grep snap_trimq | cut -d[ -f2 | cut >>> -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>> 0 >>> $ echo $(( $(ceph pg 1.3f query | grep snap_trimq | cut -d[ -f2 | cut >>> -d] -f1 | tr ',' '\n' | wc -l) - 1 )) >>> 0 >>> ===================== >>> >>> >>> While going through the PG query, I find that these PGs have no value in >>> purged_snaps section too. >>> For eg : >>> ceph pg 55.80 query >>> -- >>> --- >>> --- >>> { >>> "peer": "83(3)", >>> "pgid": "55.80s3", >>> "last_update": "43360'15121927", >>> "last_complete": "43345'15073146", >>> "log_tail": "43335'15064480", >>> "last_user_version": 15066124, >>> "last_backfill": "MAX", >>> "last_backfill_bitwise": 1, >>> "purged_snaps": [], >>> "history": { >>> "epoch_created": 5950, >>> "epoch_pool_created": 5950, >>> "last_epoch_started": 43339, >>> "last_interval_started": 43338, >>> "last_epoch_clean": 43340, >>> "last_interval_clean": 43338, >>> "last_epoch_split": 0, >>> "last_epoch_marked_full": 42032, >>> "same_up_since": 43338, >>> "same_interval_since": 43338, >>> "same_primary_since": 43276, >>> "last_scrub": "35299'13072533", >>> "last_scrub_stamp": "2018-01-18 14:01:19.557972", >>> "last_deep_scrub": "31372'12176860", >>> "last_deep_scrub_stamp": "2018-01-15 12:21:17.025305", >>> "last_clean_scrub_stamp": "2018-01-18 14:01:19.557972" >>> }, >>> >>> Not sure if it is related. >>> >>> The cluster is not open to any new clients. However we see a steady >>> growth of space usage every day. >>> And worst case scenario, it might grow faster than we can add more >>> space, which will be dangerous. >>> >>> Any help is really appreciated. >>> >>> Karun Josy >>> >>> On Fri, Jan 26, 2018 at 8:23 PM, David Turner <[email protected]> >>> wrote: >>> >>>> "snap_trimq": "[]", >>>> >>>> That is exactly what you're looking for to see how many objects a PG >>>> still had that need to be cleaned up. I think something like this should >>>> give you the number of objects in the snap_trimq for a PG. >>>> >>>> echo $(( $(ceph pg $pg query | grep snap_trimq | cut -d[ -f2 | cut -d] >>>> -f1 | tr ',' '\n' | wc -l) - 1 )) >>>> >>>> Note, I'm not at a computer and topping this from my phone so it's not >>>> pretty and I know of a few ways to do that better, but that should work all >>>> the same. >>>> >>>> For your needs a visual inspection of several PGs should be sufficient >>>> to see if there is anything in the snap_trimq to begin with. >>>> >>>> On Fri, Jan 26, 2018, 9:18 AM Karun Josy <[email protected]> wrote: >>>> >>>>> Hi David, >>>>> >>>>> Thank you for the response. To be honest, I am afraid it is going to >>>>> be a issue in our cluster. >>>>> It seems snaptrim has not been going on for sometime now , maybe >>>>> because we were expanding the cluster adding nodes for the past few weeks. >>>>> >>>>> I would be really glad if you can guide me how to overcome this. >>>>> Cluster has about 30TB data and 11 million objects. With about 100 >>>>> disks spread across 16 nodes. Version is 12.2.2 >>>>> Searching through the mailing lists I can see many cases where the >>>>> performance were affected while snaptrimming. >>>>> >>>>> Can you help me figure out these : >>>>> >>>>> - How to find snaptrim queue of a PG. >>>>> - Can snaptrim be started just on 1 PG >>>>> - How can I make sure cluster IO performance is not affected ? >>>>> I read about osd_snap_trim_sleep , how can it be changed ? >>>>> Is this the command : ceph tell osd.* injectargs >>>>> '--osd_snap_trim_sleep 0.005' >>>>> >>>>> If yes what is the recommended value that we can use ? >>>>> >>>>> Also, what all parameters should we be concerned about? I would really >>>>> appreciate any suggestions. >>>>> >>>>> >>>>> Below is a brief extract of a PG queried >>>>> ---------------------------- >>>>> ceph pg 55.77 query >>>>> { >>>>> "state": "active+clean", >>>>> "snap_trimq": "[]", >>>>> --- >>>>> ---- >>>>> >>>>> "pgid": "55.77s7", >>>>> "last_update": "43353'17222404", >>>>> "last_complete": "42773'16814984", >>>>> "log_tail": "42763'16812644", >>>>> "last_user_version": 16814144, >>>>> "last_backfill": "MAX", >>>>> "last_backfill_bitwise": 1, >>>>> "purged_snaps": [], >>>>> "history": { >>>>> "epoch_created": 5950, >>>>> --- >>>>> --- >>>>> --- >>>>> >>>>> >>>>> Karun Josy >>>>> >>>>> On Fri, Jan 26, 2018 at 6:36 PM, David Turner <[email protected]> >>>>> wrote: >>>>> >>>>>> You may find the information in this ML thread useful. >>>>>> https://www.spinics.net/lists/ceph-users/msg41279.html >>>>>> >>>>>> It talks about a couple ways to track your snaptrim queue. >>>>>> >>>>>> On Fri, Jan 26, 2018 at 2:09 AM Karun Josy <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> We have set no scrub , no deep scrub flag on a ceph cluster. >>>>>>> When we are deleting snapshots we are not seeing any change in usage >>>>>>> space. >>>>>>> >>>>>>> I understand that Ceph OSDs delete data asynchronously, so deleting >>>>>>> a snapshot doesn’t free up the disk space immediately. But we are not >>>>>>> seeing any change for sometime. >>>>>>> >>>>>>> What can be possible reason ? Any suggestions would be really >>>>>>> helpful as the cluster size seems to be growing each day even though >>>>>>> snapshots are deleted. >>>>>>> >>>>>>> >>>>>>> Karun >>>>>>> _______________________________________________ >>>>>>> ceph-users mailing list >>>>>>> [email protected] >>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>> >>>>>> >>>>> >>> >>
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
