Re: [ceph-users] Snapshot trimming

David Turner Sat, 27 Jan 2018 05:46:36 -0800

Prove* a positive

On Sat, Jan 27, 2018, 8:45 AM David Turner <[email protected]> wrote:


> Unless you have things in your snap_trimq, your problem isn't snap
> trimming. That is currently how you can check snap trimming and you say
> you're caught up.
>
> Are you certain that you are querying the correct pool for the images you
> are snapshotting. You showed that you tested 4 different pools. You should
> only need to check the pool with the images you are dealing with.
>
> You can inversely price a positive by changing your snap_trim settings to
> not do any cleanup and see if the appropriate PGs have anything in their q.
>
> On Sat, Jan 27, 2018, 12:06 AM Karun Josy <[email protected]> wrote:
>
>> Is scrubbing and deep scrubbing necessary for Snaptrim operation to
>> happen ?
>>
>> Karun Josy
>>
>> On Fri, Jan 26, 2018 at 9:29 PM, Karun Josy <[email protected]> wrote:
>>
>>> Thank you for your quick response!
>>>
>>> I used the command to fetch the snap_trimq from many pgs, however it
>>> seems they don't have any in queue ?
>>>
>>> For eg :
>>> ====================
>>> $ echo $(( $(ceph pg  55.4a query | grep snap_trimq | cut -d[ -f2 | cut
>>> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>> 0
>>> $ echo $(( $(ceph pg  55.5a query | grep snap_trimq | cut -d[ -f2 | cut
>>> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>> 0
>>> $ echo $(( $(ceph pg  55.88 query | grep snap_trimq | cut -d[ -f2 | cut
>>> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>> 0
>>> $ echo $(( $(ceph pg  55.55 query | grep snap_trimq | cut -d[ -f2 | cut
>>> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>> 0
>>> $ echo $(( $(ceph pg  54.a query | grep snap_trimq | cut -d[ -f2 | cut
>>> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>> 0
>>> $ echo $(( $(ceph pg  34.1d query | grep snap_trimq | cut -d[ -f2 | cut
>>> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>> 0
>>> $ echo $(( $(ceph pg  1.3f query | grep snap_trimq | cut -d[ -f2 | cut
>>> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>> 0
>>> =====================
>>>
>>>
>>> While going through the PG query, I find that these PGs have no value in
>>> purged_snaps section too.
>>> For eg :
>>> ceph pg  55.80 query
>>> --
>>> ---
>>> ---
>>>  {
>>>             "peer": "83(3)",
>>>             "pgid": "55.80s3",
>>>             "last_update": "43360'15121927",
>>>             "last_complete": "43345'15073146",
>>>             "log_tail": "43335'15064480",
>>>             "last_user_version": 15066124,
>>>             "last_backfill": "MAX",
>>>             "last_backfill_bitwise": 1,
>>>             "purged_snaps": [],
>>>             "history": {
>>>                 "epoch_created": 5950,
>>>                 "epoch_pool_created": 5950,
>>>                 "last_epoch_started": 43339,
>>>                 "last_interval_started": 43338,
>>>                 "last_epoch_clean": 43340,
>>>                 "last_interval_clean": 43338,
>>>                 "last_epoch_split": 0,
>>>                 "last_epoch_marked_full": 42032,
>>>                 "same_up_since": 43338,
>>>                 "same_interval_since": 43338,
>>>                 "same_primary_since": 43276,
>>>                 "last_scrub": "35299'13072533",
>>>                 "last_scrub_stamp": "2018-01-18 14:01:19.557972",
>>>                 "last_deep_scrub": "31372'12176860",
>>>                 "last_deep_scrub_stamp": "2018-01-15 12:21:17.025305",
>>>                 "last_clean_scrub_stamp": "2018-01-18 14:01:19.557972"
>>>             },
>>>
>>> Not sure if it is related.
>>>
>>> The cluster is not open to any new clients. However we see a steady
>>> growth of  space usage every day.
>>> And worst case scenario, it might grow faster than we can add more
>>> space, which will be dangerous.
>>>
>>> Any help is really appreciated.
>>>
>>> Karun Josy
>>>
>>> On Fri, Jan 26, 2018 at 8:23 PM, David Turner <[email protected]>
>>> wrote:
>>>
>>>> "snap_trimq": "[]",
>>>>
>>>> That is exactly what you're looking for to see how many objects a PG
>>>> still had that need to be cleaned up. I think something like this should
>>>> give you the number of objects in the snap_trimq for a PG.
>>>>
>>>> echo $(( $(ceph pg $pg query | grep snap_trimq | cut -d[ -f2 | cut -d]
>>>> -f1 | tr ',' '\n' | wc -l) - 1 ))
>>>>
>>>> Note, I'm not at a computer and topping this from my phone so it's not
>>>> pretty and I know of a few ways to do that better, but that should work all
>>>> the same.
>>>>
>>>> For your needs a visual inspection of several PGs should be sufficient
>>>> to see if there is anything in the snap_trimq to begin with.
>>>>
>>>> On Fri, Jan 26, 2018, 9:18 AM Karun Josy <[email protected]> wrote:
>>>>
>>>>>  Hi David,
>>>>>
>>>>> Thank you for the response. To be honest, I am afraid it is going to
>>>>> be a issue in our cluster.
>>>>> It seems snaptrim has not been going on for sometime now , maybe
>>>>> because we were expanding the cluster adding nodes for the past few weeks.
>>>>>
>>>>> I would be really glad if you can guide me how to overcome this.
>>>>> Cluster has about 30TB data and 11 million objects. With about 100
>>>>> disks spread across 16 nodes. Version is 12.2.2
>>>>> Searching through the mailing lists I can see many cases where the
>>>>> performance were affected while snaptrimming.
>>>>>
>>>>> Can you help me figure out these :
>>>>>
>>>>> - How to find snaptrim queue of a PG.
>>>>> - Can snaptrim be started just on 1 PG
>>>>> - How can I make sure cluster IO performance is not affected ?
>>>>> I read about osd_snap_trim_sleep , how can it be changed ?
>>>>> Is this the command : ceph tell osd.* injectargs
>>>>> '--osd_snap_trim_sleep 0.005'
>>>>>
>>>>> If yes what is the recommended value that we can use ?
>>>>>
>>>>> Also, what all parameters should we be concerned about? I would really
>>>>> appreciate any suggestions.
>>>>>
>>>>>
>>>>> Below is a brief extract of a PG queried
>>>>> ----------------------------
>>>>> ceph pg  55.77 query
>>>>> {
>>>>>     "state": "active+clean",
>>>>>     "snap_trimq": "[]",
>>>>> ---
>>>>> ----
>>>>>
>>>>> "pgid": "55.77s7",
>>>>>             "last_update": "43353'17222404",
>>>>>             "last_complete": "42773'16814984",
>>>>>             "log_tail": "42763'16812644",
>>>>>             "last_user_version": 16814144,
>>>>>             "last_backfill": "MAX",
>>>>>             "last_backfill_bitwise": 1,
>>>>>             "purged_snaps": [],
>>>>>             "history": {
>>>>>                 "epoch_created": 5950,
>>>>> ---
>>>>> ---
>>>>> ---
>>>>>
>>>>>
>>>>> Karun Josy
>>>>>
>>>>> On Fri, Jan 26, 2018 at 6:36 PM, David Turner <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> You may find the information in this ML thread useful.
>>>>>> https://www.spinics.net/lists/ceph-users/msg41279.html
>>>>>>
>>>>>> It talks about a couple ways to track your snaptrim queue.
>>>>>>
>>>>>> On Fri, Jan 26, 2018 at 2:09 AM Karun Josy <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> We have set no scrub , no deep scrub flag on a ceph cluster.
>>>>>>> When we are deleting snapshots we are not seeing any change in usage
>>>>>>> space.
>>>>>>>
>>>>>>> I understand that Ceph OSDs delete data asynchronously, so deleting
>>>>>>> a snapshot doesn’t free up the disk space immediately. But we are not
>>>>>>> seeing any change for sometime.
>>>>>>>
>>>>>>> What can be possible reason ? Any suggestions would be really
>>>>>>> helpful as the cluster size seems to be growing each day even though
>>>>>>> snapshots are deleted.
>>>>>>>
>>>>>>>
>>>>>>> Karun
>>>>>>> _______________________________________________
>>>>>>> ceph-users mailing list
>>>>>>> [email protected]
>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>
>>>>>>
>>>>>
>>>
>>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Snapshot trimming

Reply via email to