As I look at more of these stuck ops, it looks like more of them are actually 
waiting on subops than on osdmap updates, so maybe there is still some headway 
to be made with the weighted priority queue settings. I do see OSDs waiting for 
map updates all the time, but they aren’t blocking things as much as the subops 
are. Thoughts?


________________________________

[cid:image99464a.JPG@898dfa11.4e81d597]<https://storagecraft.com>       Steve 
Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |

________________________________

If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.

________________________________
From: Steve Taylor
Sent: Tuesday, February 7, 2017 9:13 AM
To: 'ceph-users@lists.ceph.com' <ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

Sorry, I lost the previous thread on this. I apologize for the resulting 
incomplete reply.

The issue that we’re having with Jewel, as David Turner mentioned, is that we 
can’t seem to throttle snap trimming sufficiently to prevent it from blocking 
I/O requests. On further investigation, I encountered 
osd_op_pq_max_tokens_per_priority, which should be able to be used in 
conjunction with ‘osd_op_queue = wpq’ to govern the availability of queue 
positions for various operations using costs if I understand correctly. I’m 
testing with RBDs using 4MB objects, so in order to leave plenty of room in the 
weighted priority queue for client I/O, I set osd_op_pq_max_tokens_per_priority 
to 64MB and osd_snap_trim_cost to 32MB+1. I figured this should essentially 
reserve 32MB in the queue for client I/O operations, which are prioritized 
higher and therefore shouldn’t get blocked.

I still see blocked I/O requests, and when I dump in-flight ops, they show ‘op 
must wait for map.’ I assume this means that what’s blocking the I/O requests 
at this point is all of the osdmap updates caused by snap trimming, and not the 
actual snap trimming itself starving the ops of op threads. Hammer is able to 
mitigate this with osd_snap_trim_sleep by directly throttling snap trimming and 
therefore causing less frequent osdmap updates, but there doesn’t seem to be a 
good way to accomplish the same thing with Jewel.

First of all, am I understanding these settings correctly? If so, are there 
other settings that could potentially help here, or do we just need something 
like Sam already mentioned that can sort of reserve threads for client I/O 
requests? Even then it seems like we might have issues if we can’t also 
throttle snap trimming. We delete a LOT of RBD snapshots on a daily basis, 
which we recognize is an extreme use case. Just wondering if there’s something 
else to try or if we need to start working toward implementing something new 
ourselves to handle our use case better.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to