Hi, Can you take a look at https://github.com/ceph/ceph/pull/2363 and see if that addresses the behavior you saw?
Thanks! sage On Fri, 29 Aug 2014, Sage Weil wrote: > Hi, > > I've opened http://tracker.ceph.com/issues/9285 to track this. > > I think you're right--we need a check in agent_maybe_evict() that will > skip objects that are being promoted. I suspect a flag on the > ObjectContext is enough? > > sage > > > On Fri, 29 Aug 2014, Wang, Zhiqiang wrote: > > > Hi all, > > > > I've ran into this slow request issue some time ago. The problem is like > > this: when running with cache tieing, there are 'slow request' warning > > messages in the log file like below. > > > > 2014-08-29 10:18:24.669763 7f9b20f1b700 0 log [WRN] : 1 slow requests, 1 > > included below; oldest blocked for > 30.996595 secs > > 2014-08-29 10:18:24.669768 7f9b20f1b700 0 log [WRN] : slow request > > 30.996595 seconds old, received at 2014-08-29 10:17:53.673142: > > osd_op(client.114176.0:144919 rb.0.17f56.6b8b4567.000000000935 [sparse-read > > 3440640~4096] 45.cf45084b ack+read e26168) v4 currently waiting for rw locks > > > > Recently I made some changes to the log, captured this problem, and finally > > figured out its root cause. You can check the attachment for the logs. > > > > Here is the root cause: > > There is a cache miss when doing read. During promotion, after copying the > > data from base tier osd, the cache tier primary osd replicates the data to > > other cache tier osds. Some times this takes quite a long time. During this > > period of time, the promoted object may be evicted because the cache tier > > is full. When the primary osd finally gets the replication response and > > restarts the original read request, it doesn't find the object in the cache > > tier, and do promotion again. This loops for several times, and we'll see > > the 'slow request' in the logs. Theoretically, this could loops forever, > > and the request from the client would never be finished. > > > > There is a simple fix for this: > > Add a field in the object state, indicating the status of the promotion. > > It's set to true after the copy of data from base tier and before the > > replication. It's reset to false after the replication and the original > > client request starts to execute. Evicting is not allowed when this field > > is true. > > > > What do you think? > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to [email protected] > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
