Hi,
With inspiration from all the other performance threads going on here, I
started to investigate on my own as well.
I’m seeing a lot iowait on the OSD, and the journal utilised at 2-7%, with
about 8-30MB/s (mostly around 8MB/s write). This is a dumpling cluster. The
goal here is to increase the utilisation to maybe 50%.
Journals: Intel DC S3700, OSD: HGST 4TB
I did some initial testing to make the wbthrottle have more in the buffer, and
I think I managed to do it, didn’t affect the journal utilisation though.
There’s 12 cores for the 10 OSDs per machine to utilise, and they use about 20%
of them, so I guess no bottle neck there.
Well that’s the problem, I really can’t see any bottleneck with the current
layout, maybe it’s out copper 10Gb that’s giving us too much latency?
It would be fancy with some kind of bottle-neck troubleshoot in ceph docs :)
I’m guessing I’m not the only one on these kinds of specs and would be
interesting to see if there’s optimisation to be done.
Hope you guys have a nice weekend :)
Cheers,
Josef
Ping from a host to OSD:
6 packets transmitted, 6 received, 0% packet loss, time 4998ms
rtt min/avg/max/mdev = 0.063/0.107/0.193/0.048 ms
Setting on the OSD
{ "filestore_wbthrottle_xfs_ios_start_flusher": "5000"}
{ "filestore_wbthrottle_xfs_inodes_start_flusher": "5000"}
{ "filestore_wbthrottle_xfs_ios_hard_limit": "10000"}
{ "filestore_wbthrottle_xfs_inodes_hard_limit": "10000"}
{ "filestore_max_sync_interval": "30”}
From the standard
{ "filestore_wbthrottle_xfs_ios_start_flusher": "500"}
{ "filestore_wbthrottle_xfs_inodes_start_flusher": "500"}
{ "filestore_wbthrottle_xfs_ios_hard_limit": “5000"}
{ "filestore_wbthrottle_xfs_inodes_hard_limit": “5000"}
{ "filestore_max_sync_interval": “5”}
a single dump_historic_ops
{ "description": "osd_op(client.47765822.0:99270434
rbd_data.1da982c2eb141f2.0000000000005825 [stat,write 2093056~8192] 3.8130048c
e19290)",
"rmw_flags": 6,
"received_at": "2015-04-26 08:24:03.226255",
"age": "87.026653",
"duration": "0.801927",
"flag_point": "commit sent; apply or cleanup",
"client_info": { "client": "client.47765822",
"tid": 99270434},
"events": [
{ "time": "2015-04-26 08:24:03.226329",
"event": "waiting_for_osdmap"},
{ "time": "2015-04-26 08:24:03.230921",
"event": "reached_pg"},
{ "time": "2015-04-26 08:24:03.230928",
"event": "started"},
{ "time": "2015-04-26 08:24:03.230931",
"event": "started"},
{ "time": "2015-04-26 08:24:03.231791",
"event": "waiting for subops from [22,48]"},
{ "time": "2015-04-26 08:24:03.231813",
"event": "commit_queued_for_journal_write"},
{ "time": "2015-04-26 08:24:03.231849",
"event": "write_thread_in_journal_buffer"},
{ "time": "2015-04-26 08:24:03.232075",
"event": "journaled_completion_queued"},
{ "time": "2015-04-26 08:24:03.232492",
"event": "op_commit"},
{ "time": "2015-04-26 08:24:03.233134",
"event": "sub_op_commit_rec"},
{ "time": "2015-04-26 08:24:03.233183",
"event": "op_applied"},
{ "time": "2015-04-26 08:24:04.028167",
"event": "sub_op_commit_rec"},
{ "time": "2015-04-26 08:24:04.028174",
"event": "commit_sent"},
{ "time": "2015-04-26 08:24:04.028182",
"event": "done"}]},
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com