Re: [ceph-users] Snapshot trimming

David Turner Sat, 27 Jan 2018 05:46:00 -0800

Unless you have things in your snap_trimq, your problem isn't snap
trimming. That is currently how you can check snap trimming and you say
you're caught up.


Are you certain that you are querying the correct pool for the images you
are snapshotting. You showed that you tested 4 different pools. You should
only need to check the pool with the images you are dealing with.

You can inversely price a positive by changing your snap_trim settings to
not do any cleanup and see if the appropriate PGs have anything in their q.

On Sat, Jan 27, 2018, 12:06 AM Karun Josy <[email protected]> wrote:

> Is scrubbing and deep scrubbing necessary for Snaptrim operation to happen
> ?
>
> Karun Josy
>
> On Fri, Jan 26, 2018 at 9:29 PM, Karun Josy <[email protected]> wrote:
>
>> Thank you for your quick response!
>>
>> I used the command to fetch the snap_trimq from many pgs, however it
>> seems they don't have any in queue ?
>>
>> For eg :
>> ====================
>> $ echo $(( $(ceph pg  55.4a query | grep snap_trimq | cut -d[ -f2 | cut
>> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>> 0
>> $ echo $(( $(ceph pg  55.5a query | grep snap_trimq | cut -d[ -f2 | cut
>> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>> 0
>> $ echo $(( $(ceph pg  55.88 query | grep snap_trimq | cut -d[ -f2 | cut
>> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>> 0
>> $ echo $(( $(ceph pg  55.55 query | grep snap_trimq | cut -d[ -f2 | cut
>> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>> 0
>> $ echo $(( $(ceph pg  54.a query | grep snap_trimq | cut -d[ -f2 | cut
>> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>> 0
>> $ echo $(( $(ceph pg  34.1d query | grep snap_trimq | cut -d[ -f2 | cut
>> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>> 0
>> $ echo $(( $(ceph pg  1.3f query | grep snap_trimq | cut -d[ -f2 | cut
>> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>> 0
>> =====================
>>
>>
>> While going through the PG query, I find that these PGs have no value in
>> purged_snaps section too.
>> For eg :
>> ceph pg  55.80 query
>> --
>> ---
>> ---
>>  {
>>             "peer": "83(3)",
>>             "pgid": "55.80s3",
>>             "last_update": "43360'15121927",
>>             "last_complete": "43345'15073146",
>>             "log_tail": "43335'15064480",
>>             "last_user_version": 15066124,
>>             "last_backfill": "MAX",
>>             "last_backfill_bitwise": 1,
>>             "purged_snaps": [],
>>             "history": {
>>                 "epoch_created": 5950,
>>                 "epoch_pool_created": 5950,
>>                 "last_epoch_started": 43339,
>>                 "last_interval_started": 43338,
>>                 "last_epoch_clean": 43340,
>>                 "last_interval_clean": 43338,
>>                 "last_epoch_split": 0,
>>                 "last_epoch_marked_full": 42032,
>>                 "same_up_since": 43338,
>>                 "same_interval_since": 43338,
>>                 "same_primary_since": 43276,
>>                 "last_scrub": "35299'13072533",
>>                 "last_scrub_stamp": "2018-01-18 14:01:19.557972",
>>                 "last_deep_scrub": "31372'12176860",
>>                 "last_deep_scrub_stamp": "2018-01-15 12:21:17.025305",
>>                 "last_clean_scrub_stamp": "2018-01-18 14:01:19.557972"
>>             },
>>
>> Not sure if it is related.
>>
>> The cluster is not open to any new clients. However we see a steady
>> growth of  space usage every day.
>> And worst case scenario, it might grow faster than we can add more space,
>> which will be dangerous.
>>
>> Any help is really appreciated.
>>
>> Karun Josy
>>
>> On Fri, Jan 26, 2018 at 8:23 PM, David Turner <[email protected]>
>> wrote:
>>
>>> "snap_trimq": "[]",
>>>
>>> That is exactly what you're looking for to see how many objects a PG
>>> still had that need to be cleaned up. I think something like this should
>>> give you the number of objects in the snap_trimq for a PG.
>>>
>>> echo $(( $(ceph pg $pg query | grep snap_trimq | cut -d[ -f2 | cut -d]
>>> -f1 | tr ',' '\n' | wc -l) - 1 ))
>>>
>>> Note, I'm not at a computer and topping this from my phone so it's not
>>> pretty and I know of a few ways to do that better, but that should work all
>>> the same.
>>>
>>> For your needs a visual inspection of several PGs should be sufficient
>>> to see if there is anything in the snap_trimq to begin with.
>>>
>>> On Fri, Jan 26, 2018, 9:18 AM Karun Josy <[email protected]> wrote:
>>>
>>>>  Hi David,
>>>>
>>>> Thank you for the response. To be honest, I am afraid it is going to be
>>>> a issue in our cluster.
>>>> It seems snaptrim has not been going on for sometime now , maybe
>>>> because we were expanding the cluster adding nodes for the past few weeks.
>>>>
>>>> I would be really glad if you can guide me how to overcome this.
>>>> Cluster has about 30TB data and 11 million objects. With about 100
>>>> disks spread across 16 nodes. Version is 12.2.2
>>>> Searching through the mailing lists I can see many cases where the
>>>> performance were affected while snaptrimming.
>>>>
>>>> Can you help me figure out these :
>>>>
>>>> - How to find snaptrim queue of a PG.
>>>> - Can snaptrim be started just on 1 PG
>>>> - How can I make sure cluster IO performance is not affected ?
>>>> I read about osd_snap_trim_sleep , how can it be changed ?
>>>> Is this the command : ceph tell osd.* injectargs '--osd_snap_trim_sleep
>>>> 0.005'
>>>>
>>>> If yes what is the recommended value that we can use ?
>>>>
>>>> Also, what all parameters should we be concerned about? I would really
>>>> appreciate any suggestions.
>>>>
>>>>
>>>> Below is a brief extract of a PG queried
>>>> ----------------------------
>>>> ceph pg  55.77 query
>>>> {
>>>>     "state": "active+clean",
>>>>     "snap_trimq": "[]",
>>>> ---
>>>> ----
>>>>
>>>> "pgid": "55.77s7",
>>>>             "last_update": "43353'17222404",
>>>>             "last_complete": "42773'16814984",
>>>>             "log_tail": "42763'16812644",
>>>>             "last_user_version": 16814144,
>>>>             "last_backfill": "MAX",
>>>>             "last_backfill_bitwise": 1,
>>>>             "purged_snaps": [],
>>>>             "history": {
>>>>                 "epoch_created": 5950,
>>>> ---
>>>> ---
>>>> ---
>>>>
>>>>
>>>> Karun Josy
>>>>
>>>> On Fri, Jan 26, 2018 at 6:36 PM, David Turner <[email protected]>
>>>> wrote:
>>>>
>>>>> You may find the information in this ML thread useful.
>>>>> https://www.spinics.net/lists/ceph-users/msg41279.html
>>>>>
>>>>> It talks about a couple ways to track your snaptrim queue.
>>>>>
>>>>> On Fri, Jan 26, 2018 at 2:09 AM Karun Josy <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We have set no scrub , no deep scrub flag on a ceph cluster.
>>>>>> When we are deleting snapshots we are not seeing any change in usage
>>>>>> space.
>>>>>>
>>>>>> I understand that Ceph OSDs delete data asynchronously, so deleting a
>>>>>> snapshot doesn’t free up the disk space immediately. But we are not 
>>>>>> seeing
>>>>>> any change for sometime.
>>>>>>
>>>>>> What can be possible reason ? Any suggestions would be really helpful
>>>>>> as the cluster size seems to be growing each day even though snapshots 
>>>>>> are
>>>>>> deleted.
>>>>>>
>>>>>>
>>>>>> Karun
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> [email protected]
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>
>>>>>
>>>>
>>
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Snapshot trimming

Reply via email to