Guys,
I'm running a three node cluster (version 0.53), and after a while of
running under constant write load generated by two daemons, I am
seeing that 1 request is totally blocked:
[WRN] 1 slow requests, 1 included below; oldest blocked for > 7550.891933 secs
2012-10-29 10:33:54.689563 osd.0 [WRN] slow request 7550.891933
seconds old, received at 2012-10-29 08:28:03.797576:
osd_sub_op(client.4116.0:490 0.3e
e3aa943e//logger/pg/data/2012-10-29/BWBCK/1351524240/head//0 [] v
13'37 snapset=0=[]:[] snapc=0=[]) v7 currently started
ceph --admin-daemon /path/to/osd.1.asok dump_ops_in_flight gives:
"ops": [
{ "description": "osd_sub_op(client.4116.0:490 0.3e
e3aa943e\/\/logger\/pg\/data\/2012-10-29\/BWBCK\/1351524240\/head\/\/0
[] v 13'37 snapset=0=[]:[] snapc=0=[])",
"received_at": "2012-10-29 08:28:03.797576",
"age": "8348.393528",
"duration": "0.045426",
"flag_point": "started",
"events": [
{ "time": "2012-10-29 08:28:03.805648",
"event": "waiting_for_osdmap"},
{ "time": "2012-10-29 08:28:03.806203",
"event": "reached_pg"},
{ "time": "2012-10-29 08:28:03.806222",
"event": "started"},
{ "time": "2012-10-29 08:28:03.806299",
"event": "commit_queued_for_journal_write"},
{ "time": "2012-10-29 08:28:03.807905",
"event": "write_thread_in_journal_buffer"},
{ "time": "2012-10-29 08:28:03.808154",
"event": "journaled_completion_queued"},
{ "time": "2012-10-29 08:28:03.809422",
"event": "sub_op_commit"},
{ "time": "2012-10-29 08:28:03.843002",
"event": "sub_op_applied"}]}]}
Restarting the OSD kills this request. Is this a bug, and, is there a
way to stop a request without the OSD restart?
Thanks,
Ian
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html