On Fri, 2011-03-11 at 11:37 -0700, Sage Weil wrote:
> On Fri, 11 Mar 2011, Jim Schutt wrote:
> > So none of those were osd_ping messages.
> >
> > But, I still had lots of delayed acks. Here's a couple more examples:
> >
> > osd.0.log:960713:2011-03-11 10:55:32.117721 7fc4cb7fe940 --
> > 172.17.40.21:6802/28363 --> osd74 172.17.40.31:6808/24916 -- osd_ping(e14
> > as_of 14) v1 -- ?+0 0x7fc4cc16f270
> > osd.0.log:960756:2011-03-11 10:55:32.118395 7fc4c96eb940 --
> > 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53
> > pgs=47 cs=1 l=0).writer encoding 289 0x7fc4cc16f270 osd_ping(e14 as_of 14)
> > v1
> > osd.0.log:960757:2011-03-11 10:55:32.118409 7fc4c96eb940 --
> > 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53
> > pgs=47 cs=1 l=0).writer sending 289 0x7fc4cc16f270
> > osd.0.log:960758:2011-03-11 10:55:32.118422 7fc4c96eb940 --
> > 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53
> > pgs=47 cs=1 l=0).write_message 0x7fc4cc16f270 osd_ping(e14 as_of 14) v1
>
> This bit has me confused:
>
> > osd.0.log:963163:2011-03-11 10:55:32.941413 7fc4c61b6940 --
> > 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53
> > pgs=47 cs=1 l=0).reader got ack seq 289 >= 289 on 0x7fc4cc16f270
> > osd_ping(e14 as_of 14) v1
> > osd.0.log:964273:2011-03-11 10:55:33.447526 7fc4c61b6940
> > RefCountedObject::put delete 0x7fc4cc16f270 took 5.06013 secs!
> > osd.0.log:964274:2011-03-11 10:55:33.447538 7fc4c61b6940 --
> > 172.17.40.21:6802/28363 >> 172.17.40.31:6808/24916 pipe(0x28fb580 sd=53
> > pgs=47 cs=1 l=0).handle_ack finished put on 0x7fc4cc16f270
>
> It looks like ~ .5 seconds has gone by for that thread, but the ::put
> debug says 5 seconds. It happens between the 'got ack seq' and 'finished
> put' lines, though, right?
I'm also confused. Here's the code I ran:
void put() {
//generic_dout(0) << "RefCountedObject::put " << this << " " << nref.read()
<< " -> " << (nref.read() - 1) << dendl;
if (nref.dec() == 0) {
utime_t s = g_clock.now();
delete this;
utime_t e = g_clock.now();
if (double(e - s) > 0.5) {
generic_dout(1) << "RefCountedObject::put delete " << this << " took "
<< double(e - s) << " secs!" << dendl;
}
}
}
I added those double casts, because I had a similar problem with my
throttler->put
test: without the casts, it was firing but the reported delay was less than
0.25 sec. Adding the casts stopped that - I haven't yet checked into why.
Still checking for what I'm missing...
-- Jim
>
> sage
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html