Re: [ceph-users] Intepreting reason for blocked request

Bryan Henderson Sat, 19 May 2018 11:48:06 -0700

>>> 2018-05-03 01:56:35.249122 osd.0 192.168.1.16:6800/348 54 :
>>>   cluster [WRN] slow request 961.557151 seconds old,
>>>   received at 2018-05-03 01:40:33.689191:
>>>     pg_query(4.f epoch 490) currently wait for new map
>>>
>
>The OSD is waiting for a new OSD map, which it will get from one of its
>peers or the monitor (by request). This tends to happen if the client sees
>a newer version than the OSD does.


Hmmm.  So the client gets the current OSD map from the Monitor and then
indicates in its request to the OSD what map epoch it is using?  And if the
OSD has an older map, it requests a new one from another OSD or Monitor before
proceeding?  And I suppose if the current epoch is still older than what the
client said, the OSD keeps trying until it gets the epoch the client stated.

If that's so, this situation could happen if for some reason the client got
the idea that there's a newer map than what there really is.

What I'm looking at is probably just a Ceph bug, because this small test
cluster got into this state immediately upon startup, before any client had
connected (I assume these blocked requests are from inside the cluster), and
the requests aren't just blocked for a long time; they're blocked
indefinitely.  The only time I've seen it is when I brought the cluster up in
a different order than I usually do.  So I'm just trying to understand the
inner workings in case I need to debug it if it keeps happening.

-- 
Bryan Henderson                                   San Jose, California
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Intepreting reason for blocked request

Reply via email to