[ceph-users] Delays while waiting_for_osdmap according to dump_historic_ops

Alex Moore Sun, 07 Sep 2014 16:29:54 -0700

I recently found out about the "ceph --admin-daemon/var/run/ceph/ceph-osd.<id>.asok dump_historic_ops" command, and noticedsomething unexpected in the output on my cluster, after checkingnumerous output samples...


It looks to me like "normal" write ops on my cluster spend roughly:


<1ms between "received_at" and "waiting_for_osdmap"
<1ms between "waiting_for_osdmap" and "reached_pg"
<15ms between "reached_pg" and "commit_sent"
<15ms between "commit_sent" and "done"

For reference, this is a small (3-host) all-SSD cluster, with monitorsco-located with OSDs. Each host has: 1 SSD for the OS, 1 SSD for thejournal, and 1 SSD for the OSD + monitor data (I initially had themonitor data on the same drive as the OS, but encountered performanceproblems - which have since been allieviated by moving the monitor datato the same drives as the OSDs. Networking is infiniband (8 Gbpsdedicated point-to-point link between each pair of hosts). I'm runningv0.80.5. And the OSDs use XFS.

Anyway, as this command intentionally shows the worst few recent IOs, Ionly rarely see examples that match the above "norm". Rather, thetypical outliers that it highlights are usually write IOs with~100-300ms latency, where the extra latency exists purely between the"received_at" and "reached_pg" timestamps, and mostly in the"waiting_for_osdmap" step. Also it looks like these slow IOs come inbatches. Every write IO arriving within the same ~1 second period willsuffer from these strangely slow initial two steps, with the additionallatency being almost identical for each one within the same batch. Afterwhich things return to normal again in that those steps take <1ms. Socompared to the above "norm", these look more like:


~50ms between "received_at" and "waiting_for_osdmap"
~150ms between "waiting_for_osdmap" and "reached_pg"
<15ms between "reached_pg" and "commit_sent"
<15ms between "commit_sent" and "done"

This seems unexpected to me. I don't see why those initial steps in theIO should ever take such a long time to complete. Where should I belooking next to track down the cause? I'm guessing that"waiting_for_osdmap" involves OSD<->Mon communication, and so perhapsindicates poor performance of the Mons. But for there to be anynon-negligible delay between "received_at" and "waiting_for_osdmap"makes no sense to me at all.


Here are a couple of examples...

This is a rare example (from the perspective of the dump_historic_opsoutput) of what I believe to be a "more normal" write IO:

{ "description": "osd_op(client.10697.0:560068rbd_data.1a1a2ae8944a.0000000000000722 [] 2.1ecc1a2a ack+ondisk+writee595)",

          "received_at": "2014-09-07 23:55:58.771015",
          "age": "9.101564",
          "duration": "0.010586",
          "type_data": [
                "commit sent; apply or cleanup",
                { "client": "client.10697",
                  "tid": 560068},
                [
                    { "time": "2014-09-07 23:55:58.771391",
                      "event": "waiting_for_osdmap"},
                    { "time": "2014-09-07 23:55:58.771895",
                      "event": "reached_pg"},
                    { "time": "2014-09-07 23:55:58.771978",
                      "event": "started"},
                    { "time": "2014-09-07 23:55:58.771994",
                      "event": "started"},
                    { "time": "2014-09-07 23:55:58.772041",
                      "event": "waiting for subops from 1,2"},
                    { "time": "2014-09-07 23:55:58.772210",
                      "event": "commit_queued_for_journal_write"},
                    { "time": "2014-09-07 23:55:58.775767",
                      "event": "write_thread_in_journal_buffer"},
                    { "time": "2014-09-07 23:55:58.777814",
                      "event": "journaled_completion_queued"},
                    { "time": "2014-09-07 23:55:58.777837",
                      "event": "op_commit"},
                    { "time": "2014-09-07 23:55:58.778504",
                      "event": "op_applied"},
                    { "time": "2014-09-07 23:55:58.779218",
                      "event": "sub_op_commit_rec"},
                    { "time": "2014-09-07 23:55:58.781480",
                      "event": "sub_op_commit_rec"},
                    { "time": "2014-09-07 23:55:58.781573",
                      "event": "commit_sent"},
                    { "time": "2014-09-07 23:55:58.781601",
                      "event": "done"}]]},

Whereas this is more typical of what I normally see in the output from"dump_historic_ops":

{ "description": "osd_op(client.10737.0:877375rbd_data.25212ae8944a.00000000000001c0 [] 2.5493ce40 ack+ondisk+writee595)",

          "received_at": "2014-09-07 23:57:16.444904",
          "age": "84.769560",
          "duration": "0.269918",
          "type_data": [
                "commit sent; apply or cleanup",
                { "client": "client.10737",
                  "tid": 877375},
                [
                    { "time": "2014-09-07 23:57:16.520486",
                      "event": "waiting_for_osdmap"},
                    { "time": "2014-09-07 23:57:16.681982",
                      "event": "reached_pg"},
                    { "time": "2014-09-07 23:57:16.682030",
                      "event": "started"},
                    { "time": "2014-09-07 23:57:16.682049",
                      "event": "started"},
                    { "time": "2014-09-07 23:57:16.682119",
                      "event": "waiting for subops from 1,2"},
                    { "time": "2014-09-07 23:57:16.682292",
                      "event": "commit_queued_for_journal_write"},
                    { "time": "2014-09-07 23:57:16.682370",
                      "event": "write_thread_in_journal_buffer"},
                    { "time": "2014-09-07 23:57:16.690089",
                      "event": "journaled_completion_queued"},
                    { "time": "2014-09-07 23:57:16.691201",
                      "event": "op_commit"},
                    { "time": "2014-09-07 23:57:16.692730",
                      "event": "sub_op_commit_rec"},
                    { "time": "2014-09-07 23:57:16.700063",
                      "event": "sub_op_commit_rec"},
                    { "time": "2014-09-07 23:57:16.700105",
                      "event": "commit_sent"},
                    { "time": "2014-09-07 23:57:16.714792",
                      "event": "op_applied"},
                    { "time": "2014-09-07 23:57:16.714822",
                      "event": "done"}]]}]}

Alex
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Delays while waiting_for_osdmap according to dump_historic_ops

Reply via email to