Is scrubbing and deep scrubbing necessary for Snaptrim operation to happen ?

Karun Josy

On Fri, Jan 26, 2018 at 9:29 PM, Karun Josy <[email protected]> wrote:

> Thank you for your quick response!
>
> I used the command to fetch the snap_trimq from many pgs, however it seems
> they don't have any in queue ?
>
> For eg :
> ====================
> $ echo $(( $(ceph pg  55.4a query | grep snap_trimq | cut -d[ -f2 | cut
> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
> 0
> $ echo $(( $(ceph pg  55.5a query | grep snap_trimq | cut -d[ -f2 | cut
> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
> 0
> $ echo $(( $(ceph pg  55.88 query | grep snap_trimq | cut -d[ -f2 | cut
> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
> 0
> $ echo $(( $(ceph pg  55.55 query | grep snap_trimq | cut -d[ -f2 | cut
> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
> 0
> $ echo $(( $(ceph pg  54.a query | grep snap_trimq | cut -d[ -f2 | cut -d]
> -f1 | tr ',' '\n' | wc -l) - 1 ))
> 0
> $ echo $(( $(ceph pg  34.1d query | grep snap_trimq | cut -d[ -f2 | cut
> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
> 0
> $ echo $(( $(ceph pg  1.3f query | grep snap_trimq | cut -d[ -f2 | cut -d]
> -f1 | tr ',' '\n' | wc -l) - 1 ))
> 0
> =====================
>
>
> While going through the PG query, I find that these PGs have no value in
> purged_snaps section too.
> For eg :
> ceph pg  55.80 query
> --
> ---
> ---
>  {
>             "peer": "83(3)",
>             "pgid": "55.80s3",
>             "last_update": "43360'15121927",
>             "last_complete": "43345'15073146",
>             "log_tail": "43335'15064480",
>             "last_user_version": 15066124,
>             "last_backfill": "MAX",
>             "last_backfill_bitwise": 1,
>             "purged_snaps": [],
>             "history": {
>                 "epoch_created": 5950,
>                 "epoch_pool_created": 5950,
>                 "last_epoch_started": 43339,
>                 "last_interval_started": 43338,
>                 "last_epoch_clean": 43340,
>                 "last_interval_clean": 43338,
>                 "last_epoch_split": 0,
>                 "last_epoch_marked_full": 42032,
>                 "same_up_since": 43338,
>                 "same_interval_since": 43338,
>                 "same_primary_since": 43276,
>                 "last_scrub": "35299'13072533",
>                 "last_scrub_stamp": "2018-01-18 14:01:19.557972",
>                 "last_deep_scrub": "31372'12176860",
>                 "last_deep_scrub_stamp": "2018-01-15 12:21:17.025305",
>                 "last_clean_scrub_stamp": "2018-01-18 14:01:19.557972"
>             },
>
> Not sure if it is related.
>
> The cluster is not open to any new clients. However we see a steady growth
> of  space usage every day.
> And worst case scenario, it might grow faster than we can add more space,
> which will be dangerous.
>
> Any help is really appreciated.
>
> Karun Josy
>
> On Fri, Jan 26, 2018 at 8:23 PM, David Turner <[email protected]>
> wrote:
>
>> "snap_trimq": "[]",
>>
>> That is exactly what you're looking for to see how many objects a PG
>> still had that need to be cleaned up. I think something like this should
>> give you the number of objects in the snap_trimq for a PG.
>>
>> echo $(( $(ceph pg $pg query | grep snap_trimq | cut -d[ -f2 | cut -d]
>> -f1 | tr ',' '\n' | wc -l) - 1 ))
>>
>> Note, I'm not at a computer and topping this from my phone so it's not
>> pretty and I know of a few ways to do that better, but that should work all
>> the same.
>>
>> For your needs a visual inspection of several PGs should be sufficient to
>> see if there is anything in the snap_trimq to begin with.
>>
>> On Fri, Jan 26, 2018, 9:18 AM Karun Josy <[email protected]> wrote:
>>
>>>  Hi David,
>>>
>>> Thank you for the response. To be honest, I am afraid it is going to be
>>> a issue in our cluster.
>>> It seems snaptrim has not been going on for sometime now , maybe because
>>> we were expanding the cluster adding nodes for the past few weeks.
>>>
>>> I would be really glad if you can guide me how to overcome this.
>>> Cluster has about 30TB data and 11 million objects. With about 100 disks
>>> spread across 16 nodes. Version is 12.2.2
>>> Searching through the mailing lists I can see many cases where the
>>> performance were affected while snaptrimming.
>>>
>>> Can you help me figure out these :
>>>
>>> - How to find snaptrim queue of a PG.
>>> - Can snaptrim be started just on 1 PG
>>> - How can I make sure cluster IO performance is not affected ?
>>> I read about osd_snap_trim_sleep , how can it be changed ?
>>> Is this the command : ceph tell osd.* injectargs '--osd_snap_trim_sleep
>>> 0.005'
>>>
>>> If yes what is the recommended value that we can use ?
>>>
>>> Also, what all parameters should we be concerned about? I would really
>>> appreciate any suggestions.
>>>
>>>
>>> Below is a brief extract of a PG queried
>>> ----------------------------
>>> ceph pg  55.77 query
>>> {
>>>     "state": "active+clean",
>>>     "snap_trimq": "[]",
>>> ---
>>> ----
>>>
>>> "pgid": "55.77s7",
>>>             "last_update": "43353'17222404",
>>>             "last_complete": "42773'16814984",
>>>             "log_tail": "42763'16812644",
>>>             "last_user_version": 16814144,
>>>             "last_backfill": "MAX",
>>>             "last_backfill_bitwise": 1,
>>>             "purged_snaps": [],
>>>             "history": {
>>>                 "epoch_created": 5950,
>>> ---
>>> ---
>>> ---
>>>
>>>
>>> Karun Josy
>>>
>>> On Fri, Jan 26, 2018 at 6:36 PM, David Turner <[email protected]>
>>> wrote:
>>>
>>>> You may find the information in this ML thread useful.
>>>> https://www.spinics.net/lists/ceph-users/msg41279.html
>>>>
>>>> It talks about a couple ways to track your snaptrim queue.
>>>>
>>>> On Fri, Jan 26, 2018 at 2:09 AM Karun Josy <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We have set no scrub , no deep scrub flag on a ceph cluster.
>>>>> When we are deleting snapshots we are not seeing any change in usage
>>>>> space.
>>>>>
>>>>> I understand that Ceph OSDs delete data asynchronously, so deleting a
>>>>> snapshot doesn’t free up the disk space immediately. But we are not seeing
>>>>> any change for sometime.
>>>>>
>>>>> What can be possible reason ? Any suggestions would be really helpful
>>>>> as the cluster size seems to be growing each day even though snapshots are
>>>>> deleted.
>>>>>
>>>>>
>>>>> Karun
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> [email protected]
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>
>>>
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to